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SYMMETRICAL FILTRATION OF ABSOLUTE DIFFERENCE 

(57)Abstract: 

PROBLEM TO BE SOLVED; To provide an image 
processing peripheral device which efficiently makes 
various calculations for image processing 
SOLUTION: Proposed archrtecture is incorporated as a 
coprocessor 140 in a digital signal proce$sor(DSP) and 
assists in the calculation of the total of absolute 
differences, symmetrical row/column RR filtration having | 
a down sampling (or up sampling) opt'on, roviz/column 
discrete DGT/IDCT, and general algebraic functions,, 
This architecture is composed of 8 multiplication 
accumufation hardware units, which are connected in 
parallel and have their paths selected and depends upon 
a DMA controller 120 to retrieve and write back data 
from and to a DSP memory without having a DSP core 
1 10 intervene. The DSP after setting up DMA transfer 
and IPP/DMA synchronism in advance, moves to its 
process task Furthermore, the DSP can be 
synchronized with IPP architecture to transfer and can 
synchronize data by itself 
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t<: m^o>m'mm(r>^^y [^i^t^B^^^lf"^20^. 
7}t^^?^&i^m^^^^fm2(DXt\t. mx.±(!> 

m 1 (Dmw^Mt 



mmm2<DM\n^^t. 

ti^. m^0f^t^^o^yh^^t^m^^<^lSmz<DX 
^0mtmi^tzi±m(r>m^oyxt\t-m(D^^^(Dm2 

[^*ii4] sM-^^mmmwix^^ ox 
G)Xtit^%t^t tijiz. mth:bi^f^t^ 

[0 0 0 1] 

immmth^m^m'^ ^mm, -mz m^^m 
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[00021 

mit. (2-D) mmf^-^7t ci -d) m,'^t 

- D^<ci^fl-**Lfc«£*(DD s p 7f a-b i/iJ-fecttJJzi ■:f 

□ ■fe ^-ttfi. 2 - D If x:rfi^^5aSt-^f::i6lC[±^^ 
niJ^/lj- (DSP) t.fz[t::i'Jn^')^^tm^1li'^X}^ 

fs-^i? fe^ ^ ^ici^ , ^li^^^ ^j^-r o r-^^^ ^ ^ fi 

\t., ^tm^tm (kernai) t^^>^s^\^^Lx (conv 
i6£i^|:^^^yiS^;ti^o ±!B?&^b.. 1"D>/2-DjS 

^ ^^mmumm (^smac) =i7?n-fe->Ht t-^ 

viJ.7;U#-tSg6 0X0 7 3 6 6 8^ 
(T 1-26 86 8) r^«fi«)fcftgfe**lfc/\-K'5 

x7 =f:/a-k v-tf-^i#-:>DSPj v'JT;i/S-t^6 

0^^0 7 3 6 4 1^ (T I - 2 68 6 7) tit T^X 

^a«]*^!S>f >/^;L'Xjc£@y'^]i:3-iji^ii (fir 

[00 03] 

^ (DSP) {Cffl^3.^tLr. ffi*t^»0^ff ^Tr^?!. 

3>t1to^^jJf:^y?a^|g^>/^;i.xj:&^ (fir) 



*ML^ffda DSPI*. TebDMAK3i33J:tf I PP 

/DMAjHiSaS^-:/ h7 y:^U. ^tLgtco^lS^zT:':? 

^fzlt.. OSP(j:.. C;h b<35>Siiif=^l-^^ I P 

r-^^<7T^ (71-268 58 i^^JTJim^m 
60./ 073 641#, ^^SR^fill^ll*I^SS/^'~ 
h*^X7 * =i3^i=l-fe^:'t^ " y F-J 1 9 9 8:^1^ 
40^11 ::^(::|g^^^) cfcyt 2-D;:t?S ^fii 

2 - Dig ^Ml $1 A<7 * f7 5 D s P ©^bI^ ;k:ti 
izi$^-t^. ::(07-^-r^T^iifi£?lRrB6t?Sor. 

( D M A 7&^^D S P <t □ P ^ iDfel07^-3^ ^ii * 
-h^Sat?^it31t§-i:35^-^#^if^) cffl7™-*-T^f^ 

X-i >XWL-;^ yj^i(f>TMS3 20C54x-=<f>TMS 

3 2 0C6 x<3[>J:9feR#fi£>DS Pfcffl^t^^iijc: fc:&it? 
DMA=]:^ hP— ^S^r^^ML-cL^^o 

fsimmtmz^%m<Dmm=&^k^±^\zi-i^(Dx&^. 

[0 0 0 5] 

ij- ■ =17 11 0 ^tffSI«RTffi I P P/^- K't?x7 • a y 

pir -^^tM 4 0 i $#t?Ielg& 1 0 0 ^^^to m 1 1^-. W\ 

0 7 3 6 4 1 -i- r^|E«rRg^M^i:^»/%- K^:f: 

ji'fi^::^p-ir'>-y - ^37 1 1 c>\mMom^(ri^<bx^ 
i\ I ppfi^^Jj T'^:^a33Sgr*fe'5c iPP(^> 

-f^^.?j ^ u ^ D s p ©#i6^ =E u (mtb^-"^ <2>ga 

-r^ v^^JL'fi-^^i^niri^li-- :3 7 1 ^Oifi^X-fh 
Z.tU<. ilJtp^^U 7i:?feX CDMA) ^-ri^hP 
-5 1 2 0?^r L.rfr3-i!:A^t?^Sc ^fctt... DSP 
=J71 1 Of*., i PP£^1^*;<^y 1 4 1 145 l" 

4 7 (zSJi P- hVtSlit^ C ^ J: y g ^.-^T^-^S? $5 
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[0 0 0 6] ^^^^1^1 PP/\-K0x7^ n:?n-{2 
1 4 01*. jS^E<7)«lfeffi^^L . *rii;yimai ' 

^^mmr^o 2^k7bEmt. T-^■^?$^;^■ :^J;*^fwU 
mn^'^(!>Mi\t MPEGt^7=";t^^it:i;H 2 63 
<t[::^l^^ti-i)* fr/^lDC T/ I DG rf^, JPEG 

mm^y-m^n^UM peg e 7=:?i-^#xts-^r=^LA 
r 0 0 0 7 ] n,>§^<D0g^i* „ waggRFii I p 

-X 1 3 0{^^S*<;)«it©=feCOT?&§o ft^HiSl 0 0 

x7 ■ =i3^^n4r -vif 1 4 0C!)il3t3Qic^ or#=&;f$^SS 

[0 0 0 SJ ^^j^Rj^ I P py\-F^xT ■ ^T^n-t? 
^■tH 4 Oi^,. 7^-^ • /^X 1 0 1 i3cfc?/7 KL/X ■ A 
X 1 Q 3 L.-C^^[5IS 1 0 0 ©flb(»1f jS^iSfca* 

'>-tM 4 0f±.. ^'^^^U 1 4 1 <t=i ::^p-£r i'tJliS3 

7 1 4 iJ' ■ y^fj 1 4 5^^^i£>^U 147^ 

^^feo *^^^U 1 4 1 T^4'£>i?JUg^::fPHr^ 

1^ =17 1 1 Oi6^~SMprB^ I P P/^- K"^X7 ■ u 
T^^L-J^^ 4 0CDBitSSiiait^<!:#fc*M(0M#$1- 
3:?p-iz^Hy-iigzi7i 43f±. ^^#%fT5iJ$5^ 

j$Lf=J TJ^'Pii :v vV^^SI Cco-processing function 



il=i7 1 4 3^:7^-^ ■ y^U 1 4 5^^ii;*^y 1 4 
1 tmmy=f-¥<s>Stm:^^. -x-* ■ 1 4 5 
[^., ^Ep^Hg>^-Kf>x7 " P:^n-tz;/1^'l 4 0(z,J;o 

/\-Kr5x7- 3 ::#n-lr;fi:M 4 O0a-{^(D«gS*ffits 

■r^c -^if^^'J 1 4 7I5:,, p^a^->iflS31=3r 1 4 

W^^©:^n-(2X ' y^^7e -^^Bffi-#^o ■ > 

^ U 1 4 5 ^^ift^ ^ 1 4 7 L 

^-r-SSbtf. ^-iij&T^xh-^fei.. 0M?K--h" >^ 

fj \m h " / ^ y =1: y p 

§C<i:;b^'C^^o A^;5>ii;^=EU«Jil£-eiS. -7^ 

Sffl^fc I P P S Slit-^ L l^. SPJ^iRFffi 

I P P>\-F^X7 =1 -^U-^V'^^ 4 Ofr.i: o C7^? 

■tex-^-^ci^cD-^^^jt^ U fifsii:maislil±ic=i:?a 
■iz^yi^ltapT 1 4 3lc^^g]ia^|iig}SL rg<©3!f«^x 
Y-(z-^ht^%^^^<. ziz^p^ i;itli3lzi71 4 3<?> 

[0009] Ill2fj:., ■?< " =3 7 

1 1 o^l^jS^r^i PPM-K'j?x7 ' 3::^P-t2 >tJ- 
1 4 0i:<DF^<fl>^ U ■ vy::^ ■ -f ^x—xt^ 
t, T <5^^;Hi^::^P^r':^^:^■ ■3711 0!^:^^^^ 
y 1 4 1 £:fhLi:#|«jSRifi£ I p p/N- K^x7 " p 

^^j-i^u 1 4 1 \t. ^'%^^m\^^'^%x^%'^\, 

(FIFO) Ji=5y Tjfe-g). 1 4 KSOS^ii^ 

.-K-Kfl, xYv5;t.^i-^::#p-iz^y-9' " p7 1 i o©7 

1 01*.. *«rj<^U1 4 1^^©X^it^c-57FUXiz^& 

7 p::^P-tz v"^ 1 4 0gaW^* ^^^^U 1 4 1 
f*. L < it 2ii0M^^(c:^EPi#rtb;hfc/it-< V 
^&#ti% Sa^^K-f 1 5 1 WiZ^^h^-^t^ 

^^.?i^U 1 4 1 (T)f^^<bT KL/XfcSiX^;5^&^Jg 
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^ 1 sa&ffli-^xsitiRbtifc {FIFO) 0tmcm 

1 4 ^ !#:.,. >^ T S 1 ^^1^^ 

SEf^j:: y fijj^tt. m^^Ko^ 1 5 1 ^ig^y*v 

^ ^ L r .. > ^ 'J ^^Kfi^^' T ^ J 3^ n -4? 
•yii- ■ 37 1 1 Of^iHyai-ro 1 4 1 ©F i 

x7- □:^n't?^>+n 4 0lic4ab<D>'.Y;u5r^«S<»^ 
L L^. v^nt5!t«iii 4 9[*-t^:?;u-^:/0^i?"^^ii 
^^^#*>^f^J 1 4 1 © r-t?-:f;u-^>i^ii3 Lj 

[00 1 1] 4^^FI FO/?^P^1t^l^iit^<^=b^ 

DSPl*,. S/-^>X©iiS&7FbXS I PPM 
7.^U^3-h^0\ziPP\:L^L^. iPPtt... DSP 

^ ^^'j 1 4 ScbMigfJ^^y 1 4 7<^f^SfC7^^ 

9—^ " .?i^y 1 4 5^111^ 
>s^y 1 4 7 i:f*?fefc^^y 1 4 9 0-Sfi:L-r]^;«* 
ii^o -J^^y 1 ^sitttz.. (Iil2{cfi^Lri.^^j:i.^) 



[0 0 12] ft^SlJ^I 0 0l±;*<DJ:5f=ilfft'S» X 

rn-^i 2.oit.. =f-^^=r-¥ ' j^^jj 1 45ICS 
i[^^i[;«^y 1 4 7(1 Sfcfi.. T^-^^fccti/^ii* 
i^-;i^y 1 4 9f=n— ^c:^b^cdt;y ^iSpFffi 
I P P/s- K^x7 ■ =1 :?n-t?-;-t^ i 4 0fr^or^u 

-,4^. 1371 1 aj*c<x>-T-^eaiS@}Sf75J=5 

137 1 1 OfiDMAn^ ha— "7 1 20§fl]fflltT 

z 0)7^- ^ tei^ Sff a cfe 5 ic^^'p -y? A t % ^ i\ 
c:<Dcfc5lc„ T^^>^JHt^::^P-fe?y1t " 37 

y7&\b»^^^^I PP/N-K^X7 - prfp-fe^y+h1 
4 0 [z cJ; -3 r 7 ^ ^ X RTti J^j: ^ y ^ # S c ±! 

[0 0 1 3] i!fl|i^^^-<^T-^tO$i^^ X-^V^i 

y XAfflCD^^^ffll^TUffSfiTBg I P P/^- K'i?x7 
= p:^Ptry^t-1 4 0(r^b1i§o mf^L?i:J:5lc ^ 
^ii.. ^&«f#*^ff5ii 1 4 1 ^(m%(n'7 FL/X-^.CO>=& 
y ll5Jt^t::J:^rS1iJSlI* I P P^^- F-^xT " 

u^i^v^\ 4 0lc^il^;K^. glt^i^^ifz^^(i^^# 

pjggi pp/\— K^x7 ■ az^p^r ^-y-cjy&it^*^ 

S^^l.•rL^So S^J^rTHi pfm-K^3:7 ■ 

* !t 5 J: 5 f=S ^ i7>::&S S I ^ f^^t ^ c ^ t<X 

^fei^t^cJ;y.. iaf|*i?S:M- Kf5x7^«S^G>«8igi::Si'\'5 

[0 0 14] §ifS#^(^ , l+S:(::Si^^^^A:^x- 
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[0 0 15] ^mw^^it. ^^a)^0mti^-^t: 

:;.cO$|^fi„ Ai^ix— 5^Staii-r-6fiSf::Sa 



l^fi^J-^iD^gfcf^^i^j^^U 1 4 7l::f511^ti^. A:^ 
'r-'^it. a J^^n-tji/Ui^M^iT 1 4 3 fc^orffil^^fe 
H^f'tblZir-^ • 1 4 5(D^1©>^y^i^l 

44A^bl5^ym**l.« Cll5!^y) . ai:^]^^-^!*.. T 
^^'J<^m2(»j<^y|ii|ii 4 6(=St2tS^a.^ 

u 7^-f?xi3i^i 2 0 it. 'f-^^m^<D.;^^i)mm 

7^?-t?X[B]2®l 2 0f^. :^^J!)cpTI^i pp/n 

[0 017] nw-^'^om^it. itftL<it. -^i^^ji 
mm cD-9- >if tfcj L (- A^^*: y r 

mm. tfm^ 

[0 0 18] 



Row filEer(us ds {ength» fa[ock» data^addr, coef^addi; oiitp._ad£lr) 
CoIunmf31ter(us, ds. length, block, dAta_a(ldr, coef_addr, £T!itp_addr) 
Row filter, iyinCus. ds, lengih, Wock. da!a_addr, coefadar, ouip_addr) 
Suin_abs_difX!;ie^gck, date^adcbl, data^ailcii2 oucpjaddi) 
Row._DCi(daia_^addr, cutp ,addr), Sflw^IDCT. Columfl_pCt, CotumnJDCI 
VectOdrjuId(leiigtlx, data_addrl, data^addr2. ou£p_addT) 



10 0 19] :i^i^(r>f^y;^-^l^^. mm^m^tmt 

f^>$4xT.. SltJ^RTS^l PP/\— ■ 
=i:fpi2 'y-9-i 4 o^iSli^b-r^. ' 

PPM-K^XT- nZ^P-try-t^ 1 4 0's.(7>*^g^ffl 



S-S^jS-^OT^f v^^i'M-^z^p-feytJ"" =171 10^ 
=i7f=<^ory:? h^x7r*^jt>^.^, ^©Ii^<ffi<?> 
x5r :?$.MLvc#^^ig^#Wjf3()^.. 5fefJS^:^::^;^-5^> 
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mmf^: tfzit. asi^^u ■ r^-trxiejgsi zoo 

□ -fe -y- 1 4 0 ©ii :3b (± mfj^o^m^m L -c 

J;t;M^fSM<?)fcA^)lc^SPill 0 0©^S1'Sg<^>^^U 

ro 0 2 1 ] ^o>^. nms^-^m i p pn ^oxy ■ 

it ±k6 LfcfflSA^T— *©St tu^^Tn ^)/-?fc:otvi:i5i 
^ oX7^'\i X-Pl^^q ^ «J i:: n - F $ Jh... mil 

[0 0 2 2] nmj^'^mi ppm-kox7 ■ □^^n-t? 

1 4 03b^3O©||tgA B CSt^^t^t-So C4^e>CD 
';/'t^" =iri 1 OiCct oT1tt>ti.'&^Hl^<>^'-'U- 

□ •ty+M 40[^. ^-f. f-'^(r>:^t:W7\z^LX^ 

""J^Ji^m^-^n-ty^^ 371 1 Of^, iiii, 

'■7^"fex[li^i 2ot5?$ii^izj;or , A;b 
7=-$?^x-^ 1 4 5i=n-F-r^o iaS^3h 

^fiMpT^ I PP/\-K0x7 =i7^Pt^ 

mi-m^t^^^x. ^mimmi pp/\-K'i?x7 - =r 

^^p-^T'^y+t-l 4 0fr ^^'j 1 4 5(=tB!i^^i'ri,^'S^ 
f=M*#^„ SlfeAOHfTf*.. ^BiemflDx-^ ■ :?p 



^X7 " :3::?n fe i'tM 4 Ofe. j^^U 1 4 si^CDx-^ 

[0 0 2 3] c<»#jA\^>. ':fn^ti'pmm(F>mm'±'h^^tt^ 
^0 3 0(DS«iA B ci* -mz 

x-^SDsa-lt^X (fig^if. tnoioi 6 

r/\- Kft?x7<?>/\^>xi3c!;t/^^*Ji^^$-4^*^ 
i:liSliT?&6^o S^^rTh^i PP/\"-H^x7^^ 

yjumzmt^t. mizm(D±^t^mw^y^^ y^r^^ 

f^i\ ztn^olz.. wmM<r^mz^mm^9-^<D:fn 
V 0 xz^m^-^ixm^^"^ y ^x-S c ^ (i , z. (nir-i^ 

<s>^^mm^^ tiz^^ x^mf^^ izwm^wi y b x ^ 

[0 0 2 4] S^C^mA B Cm\zmSitt.tzZ?U 

tizi^^t^. ^tz. x<v^>L{i^::^nt'yth' =17 

v^;ufi^:^'n-b-:^t^ ■ =171 1 otS^jigRrSli 
ppy\-K^x7 " p::^*p-tz 'y-y*i 4 o=feiffic7juzp 
-K^:n^o zMi^nMommnm^^tzii^i-. ipp 

nospziT^ ^ omum^mizi^^t^. m^i^. i 

FP\z^M(DB o%^iijy^"r r Ds pfzf^||(;)2o% 

im^tttizstt>^z^t^^mt^^m'^izit. ■?< 

^j^jm^zTn-t y-^ =17 1 1 o,hs^j5jgRrtgi pp 
/\"^K'i7X7 " 3 3^a-t?i'i*'1 4 0^<^>Fs1lctt^M;1iS 

oiz^ox^t>*i^mmi^m^^^iz^-^. x yv^ 
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H^m^^c,t^mt^i\ [0 0 2 6] 

[0 0 2 5] Mffi^^rigi pp/v-K't?x7 ■ =!:?p-E? lasl 

Recdve^datajjynchcomzatioii (signal. mvtfMis), or wsil,.VJ3«il jigaal 

SendjlatajyBclirtJflizaiioa (sigaal, true/fidse), or assen oeaaJ 

SyiiJchroi]izatioii_CQS3pfctiari(5tsiial, tnie/feJsc)^ or assftrt_siiiial 

CalljiibxoiitiDe(stibrOuEine_addr> 

RctiiniO 

RcseiO 



10 0 2 7] cMbmmmmit "f'O^^MM^zfn 

=1 y-^ 1 4 O i:COfRl<7>1S#lcmvr%^'Cfe^. 
wti bO^^© 1 mmt^mT—^mM^^ CreceWe 
_data_.synGhronizatfon oonimand) i^feSo 

"^^^f^fln*^ (waft_until._ Signal cointiard) t 

L. r A:ft 7^"'^ -Ir K7 -5 t ^ y &S 
f^^i:>^i;i/g#:^n-t?^>H^- 37 1 1 o[i 
m«ligRrli3 PPM-K';7X7 ' zizfn-t^:^^^ AOiz 

[0 0 2 8] WiJ^gprHi PP/\-F'^x7 ■ =i::^p{? 
•yt^'l 4 0f±., 4 iictsis^*irL\§^ 

-feXmSl 2 0iiS®t®#«7=-5^S3il$«LStSci: 
^ D ivi A^^ t^^-jilzM^t^^^- K-l^x 7il^t»^ 

[0 0 2 9] ^m^'-'^mm^'^i^^^T^^t. mm^^ 

rT^J pp/\— K^?x7 ' =i::?p-tri/it1 4 0fi^-^t# 



[0 0 3 0] aSx-^tPM^^itl+S^^^tOJS^ 

^^a^t^. T< v5f;L'm^:^p-tr 'i'-y- ■■ 371 1 0 
li.. aig^^'J " 7^i2;J?.imi 2 0$-tr ':/ l-7^'>^L 

IM^B^I PPM-K^X7 13:^0^^+^ 1 40(=^1 

It. nm^'^mi ppj\-\^'^^7 - =j::**p-lr>>-y-i 4 
0<7>^|g^S^il]t^©[CT<^>^Ji^fi^::^p-i3:yt^" 
7 1 1 OiCci; t)T^*$ti-S V7 h^ixT " 

^jQ^^^. =1711 Of±.. AiiT— P-i^^ft 

<D^r^^>>^m^^^m.m^'^') ' 7^fe;^ 11881 

2o^J^b§^tSifi^^;^>^fe^o m^^^n^izit. 

[003 1] =b 5 1 ocD^^ttS^^f^iliiT-^Pia^ 

pn-h''t?x7 ' ^::^P-b^yii-l 4 0li,. llft^^'J - 

A&fCT^— ^ ■ ;<^U1 4 536^b^-^*^^^^^. 

■■ 7^-izxiSl^^i T^v^;Hi^3^*a-tr-y 
" =17 1 1 0{::^or¥fri^i^S^tl... iimx-i$!|5l 
S3^^{^£li^o^:^lcmig/S£RrSg I P P/%- K'i?X7 ' 
:a:?p^ '>-9-l 4 03£r^^m^^§ft Si#lcffl*&^5lx 
-S). ilf^^^'J " 7^f?X[5!^1 2 0;6<«iJriDDMAf 
^>t-^U^tJ-7i-^- hr^ii^f=ii.. ^mT-^I^^^fr 
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i2t -5^ ^- K ^ X Til ^ L * {:f ?K * tj: [\ t 
r. E}*>^U • T-i?-tr;?^|iil&i 2om0MW^'^^^- 

[0 0 3 2] Mo^mmmm^it^. mm^r^^. 

^^Kltm^^m {assert_BJgnal) ^%a>^<f>Jtm^ 
f^. Sffif^SRlRii P p/\-- K^t^xy ■ -fey +M 4 

0 ( = J; ^ # h'^mmi% c:^ < < i>iSi p -fe 5/ -9- 

(Writejarameter) ^»:<!:©^< ^OilM/^lplM^ 
*^^L\^o /i^y^^-'^t^^ (write_parajnet8r) ^ 

±/TiilgSfi^J:l/:?i-'^^> K -^-^ (S / 1 6 e 
i^h) fi?)^5*/^7?^-^(d:. /^^ji-^^a^^^ (writ 
e__parameter) ffll^r Mlf^S C <i: A^-e t « 
[0 0 3 3] Sltj^RTt^ I PP/V-K0X7 ■■ zi3fPi: 

■ff/?l]8jgDCTy I DCT 

[0 0 3 4] ^tc. ±(r>-^mmm^^<ryii^^ts^ts 

2-D(?>DC7V' I DCT 



[0 03 5] &^^fi„ sii?.-r^f^"^^^^gii<oiait 
[oo3 6]@4ri liisii 0 0(3D^a>Rrgl$iiB3nj$^ 

@4f=g^t-lS&1 oof*.. 2fl(I>SttjSRr«II PP 

/%— K^x7 ' pz^p-fes/tn 4 0 1 so*#ira 7^ 
^' v$ Jt.<i-f'::^p ^-9- 3 7f*.. « 1 ®^«J^RFffi I 
p P/\— K^'X7 ■ y'i^ 1 4 O^JJ:t^lS2t7>m 

^^w^i pp^v-K^x7 ' :2:fn'\zy■^^ soti^ 
izm^t^. mmj<y^^ s^it.. si^wii^prs^i p 

P7\-K'>'X7 ■ =i:?n-fe^-ti"1 4 0£||2(DWliJ?£^ 
til pp/\--K^x7 ' z^zfa^y'^^ 8 0(::*g-^t 

c::lt.b0a:^pfe^>l^f*., T< v52Jt/ii^::?p^r-y 
tf-' =171 1 oro:'=Erj*^^^3t-SI?^J<^'JS^ 

-r-Sa mi50^Z^n'^y^<Ds^'B^}{Z^'::jX^-^^^7 

KuxiiigEfc:^^^;?, 1 8 5S^>Lr-:;^'<»:3:?Pir y 

^t^^^:^Q-ty^mo^)>^iz^^)i^':f^o:>^^^^ 

-^(I:>^zl'^■^ymz^oxmm^^^^ o 
<^M<DiSf^ b m 2 (?) □ -t? y If J: o T 5 ;K 5 

Z^a^^y^' =]7 1 1 Otiil1g(c*tl±E}^^^U ■ 7 
-p-trT.HS&l 2 og'^UTT^— ^ " M> K^r^^iHSL 
?E*: It iKli* & c ^: 3&> P» (SSC^ ^ ^ c 
[0 0 3 7] j^ljCOM^t LT,. gl5tt.. i^XT'A /<X 1 

4 2 L xmiixmm^i\tii'< 

-y- =171 1 OfcJ;tf#«^RFfl6J PP/\-K^X7 

=i::fp t'>+M 4 0^^^. ^■'(z^^Ji^M^^f^-ky'^ 

" =17 1 1 oit&.^&moymnotjox^i'^o ^^L.L^ 
^J6iOji^«iT?f±.. mm^'^^ I P P/\- F^x7 ^ a:::^ 

^^y'^^ 4 0\t Mg^^'J " 7^4?Xlp]ggl ZQt 

Mtr.. v^^Hi^^i^n'tz-i/it • 371 i oi:(i 

^±izmmr-if^^^i7v. mB\zmtmmmm{m 

Zfn ky-^- 3 71 4 0;&?^fl[)i?5as/^xirg^L3ti:U^ 
^iz. ^mx^^. I ppp:^p-t?:/^ti 4 0A^^v7Lx 
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r. h-jumw^:^O^n5r3;^5i::o3Pf}<iPPiz 

imitzm-^izit, 0 s pt^KyMxm^^m^tM''< '-^ 

? I PPlciS^g Sfel::.. DS Pfi1i'i'<#^ (wa]t_unti 
l_signaf) *^*1PP(CS^S (KiiA<STt^^ 
c:cDl;^(i:DMA=i::^ KP-^Jc^o tSPJ$i^§T3fe 
5^) o 'j^lc DSpA^^^ hJl/JtiflS: (v8etor_add) 
^^^iPPtC^y.. C^ltCc^y. DSP(*»j|SC$:Kt 
^fifD^X^Sff 5o ^-1?. DSPt<m-^XtXl PP© 
S7^t$5^i Lt^icfcLU... h;i>JP# (vect 
cr_add) *#rcgS<T?S, 6 511^^3 (assents igna 

I) ^^^m-^ ipp^7.^(r>'^in\^0QPtm\^^ 



o[)i^C/^y5"lc^Lx I ppT?;^xy-F^:iifc^x^^ 
Sfftv, T-^^gjgSMcG^LT. ^3Ki=<J;y. dsp:& 

[003 9] !ppj±. x-^j;6mr'l3ll^2hri^'&^ 
U > if d; ? ^ >it > ^i'' U > ^^fDfc ^ i^^ti *i i^li 

2) f£{:t^^Mt'S»o ^±l^^;i£qr^i ppt^ 
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DS Pn^Ax±|:: I X0^re1^^;^^U ■ TKbX^f^ 

^Sgf:: ^ :h .^i: ^ ^ - ^ ^S^l" ^ f:: . ( ?^vX 
|iE^2A^7KbX:6«-ijy h7y:^*4i* 

[0 0 4 2] -e^^K^. 7=-^S^J^7hx h<X)^^'j^r^1 



I r 7 KUX ■ h^i?5S3A^;6^^]0S§7 KUX^-tr 

j^t I p p ' :^ ^ ij ^ 0Fs1 T*iiSW(Cx-^ 
-t^^0i't'<tx^h^, I PP *ll3i-^^f=A^>f::f^ 

1 2 8 t' y MS«i:<Dffi(=i£S^^&-5. Ctlicri. v|\$^ 

[0 0 4 3] 3 0<D^]i^^y ■ , X— 51 ' 

x-X$^LrvXxA M*X36^b7^^?Xt^Ci:A< 
■t!#-Sa ;*^U ' 05^-7 x-Xlt IPP140^ 

i/XxA • /^x 1 4 ztdymay^'E^m^t vxi^A 
■ Ax - 7^-fexSS;^=E'jfI^-a^ii'S©tcj&i^* 

ffi^fe3feA:n^a3L (FIFO) MMt^^I^mt^. 
-$A Bf^A^?/'tliS]x-^fe^<^l^MiEffi"Cfe^o 73 
X-y " K^*l.f=*^liT-^ ■ U<DSi^§S^ffi^-r 



ai) 



#|g2 0 O 1- 236496 



[0 0 4 4] 7^-^ 1 4 5i:^lft>^'; 1 4 7 

c: <t C ^ fj , mmt^'i 0 }\^%\' ^ U i:: 7 ^ 
xt^*S3&<?5:<3^d:So 7^-5» -^^^^Ji 45(i, DS 

-Ji^^^lTI 7 0^ftl.riyL^Li±iJl3 73f-^^;^5« 1 8 

fi... D S P/^X 1 4 2A^?>S(t^^V ^Sb < ft I P P g 

;A^'J 1 4 S^kjlfft^^y 1 47 tfti 2 S e^if KiiT? 

1- viSr^m^::^^ ^^*^ =17 1 1 o*tzria^f;^^'J 

1 5 0 n □ ^ y iJ-fc^ or^ 5 7^-^ 

^^J:t/i^ifeiDU3--Jbffl(?>7 KL'X$±^1-'S. 

=i-;u$;hfc 1 2 st:^ h ■■ ^"^^ri A:^i:?:f— v 

h) X— ^Ai: 1 2 8\zy (8 x 1 etf-i; K) ^ 
Bet 1 28 tf y h (8 >c 1 ee^ h) figSfr— dri:^£ii 

10 0 4 6] Zhh<D3r:i<0ir—^ 'XhU— A. T— 



i6<j^:&<^>«/il^f::^ e></^^ . ^^^^X^/>^±i^ 
' 7'S?'TzX(DS^f:^^<^->^-^^^-<t;^^'et 

•3Xft/^9-^^i:^v<^>X-r'5©f^7:^'J':r-va 

a^®2 :3li1 6tf y f- ■ T-^iSrfey. ^f?)3o<D 
3^(?>i-:3f^i 2 8e-y hS (8x 1 eei/ h) -efe 
-So 

[004 7] Z.i(\..h(hZ^(n^-^ -7.V^}—JAtm^ 

l?t—^v^ 1 a OC^A^U::^^^tl^. 

^ „ X K 'J-A^ 8 0(7) 1 2 S h T— ^Sf'HiS 
5111-^0 ctl«^<;)2':3(7}S^^?-i!l'}t«^>7 KUX(i7 KL' 

xn^^i 5 0(c<^ orit#$^i^o z:c>-^wmt,^^ 

^Jt^a- ^>f-l 9 0<;>»T(::&^. 9 

0 ^^t51J 1 4 1 U =f-Jt-"^'5 t 
=i:/p-fe5;tl-1 4 0|:*3-e^lEt'a«I1H*ff7» 

[0 0 4 8] A:ti7^-V^:;^ 1 6 0<^>ffi3g^<El 7 (C^ 
^tlrL^^o ^^lf':H l 2 8 e:/ Ka)2 0©x— ^« ' X 

20 5 2 0 7(DA:t]Icetl?tli^S^^?K^. ^TJU^ 
T'^b^lt!*. 1 ■■:><J>A^^^WLTs fO^^JS 

-r-^^f^vX^S! 2 15 2 1 7rc€-:K.?*a.fBlif '^o TiU 
^T^'Ly^1t2 0 5fd:. AltlT-^ • XhfJ-i*t©-** 
Lx>^X^ 2 1 5<^>f:^^^®SISt^. -^/^ 
^ :?L'^7+h2 0 1 f^. UvX^2 1 50|:*^^*iijRl-5 
3&V f'^iitU'vX^ 2 1 i(»i;*gSSSSi^t'5a 
::^U'?^t2 0 7ij:„ Aiix-^ ■ xf-'j-Ac?)#*^S 

us>x$2i Knnm^mmm^^ 

2 2 1 OOT-Et*;/ hf*l/vX^ 2 1 5*^e>-#x.btl^a 
^^7^12 2 1©±^e'-y hf^b^>X$2 1 1[C^oT4 

S/::?^2 2 i[^ei2)A;b<^^2 5 6i:'V h^ 

v7hUT5iJ?L. 1 2 8e^ hf±^/4:S^I«164 b X 
2- 1 T;u^::^U^Ht2 3 ifcffife^ti.. i2se^yi- 

1 fS]/- 4 7^IS]0 1 2 S b X 3 - 1 V JU5^ 
-y^2 3 5fC#|i^^;h.'5c VJU^3^b^-9-23 1<»1 28 
e KtffSit*. UvX^i 2 4 1 f=-^Kjf=SHS^ ti-St 

X^ 2 4 5lc-BtS5(cSlt**l,^*E:.ht,fc. ^-^m^ 

1 7 o-v<j>f-^?BA:ti*B^-r^, 2 

0 7(Z)aS](d:. ±v^1 w/2w./^4w 1 28 b X 4 - 1 
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10 0 4 9] fjlcit-'<fccfc3f=, 3-^09—^ Xh'J 

5^^^^t cz/^fi, 8{i®^S:ii:^^*v h (ma 

c) 36?«fj(ca^^:K.ri^'S ( TAJ mm . m^^m. 

MACtt31@<0giL^lc^^SL ^HiL-yhlQOli 

[0 0 501 ii9(±. meiz^Lfz\h±y:t—^v^ -i 

yh • T-iSrStii:^3 (A c c [0] feJ:tfAcc 

[1] ) 1 8 0^.«0S*g<7>2 OfflA 

[0] Ac c [1 ] Ac c [2] Ac c [3] 
Acc [4] Acc [5] Acc [6] iS^lfAo 

c [7] ) itm:f3-7t-'^ y^^'-(nm30Xt\immt 

-5! ■ 1 4 5fCS^3X^4l-gio 

[COS 1] mi Of*, '^-a^y^j-mw^^^ 

c rej ^(l^1-^m2^os^^L^^^]5S®^^S^r 
^^T-^^^mi lo^m'&^^^t. At^y^-^y^ 

1 6 0^\biSlft^*l-^"T-^SiSl 7 O^^eDx-'^AiiS 
cJ^^/t^— ^B(J>1 2 8 t'^i- h (8 X 1 ee -:^ F) ' -f- 

il) 310 32 0 330 340 350 36 

0 3 70 3 8 Olc^i^^;h^o cfe 51=.. 1 

6^5^ h - T^-^^S 7^— $A ro] ^^XS 

x-^B ro] (i„. Ai]S^3 1 o$^^muwme2 0\z 



A [1] i)cl:t;i^-^«B [1] It. tQUmS^O^^lf 

-^gv T-^A [2] iScfct;-T-^B £2} fi. mw 

1 ee -y h T-"^i§T-^> A [3] fec};t5f^""5?B 
[3] fiv j!iigS3 7 o£ci:t;iin»S3 a Oic^S^^jh 

mo c.a:>)im^fz{tm-0^mt/u 3? ^ ■ i^^^x 

^312 32 2 332 342 352 362 

3 7 2 3 ^XlZ. Z<Dl^m<^MT 

itm i:. 2 oco 1 6 ti" y h ' T'—^mtf^^tsi^. mi^six 

St HI Of=^Lfc8MACSj9-ef*.. 4x-50i: 
6 326 336 345 356 366 37 

6 3 8 eiztm^^^'^omtmw^^ is 33 

S 35 8 3 7 SX'jJdW^^il^^ Z.i\^(DfDM0^§^ 
ltmU^3 2 S 3 6 tl. ^fc.. ■^0^ttf* 

JtiP#l#3 4 8-Cl]0»:^tl§o 4 QO)tii:^itmm 

\ztzt^sm<hmn^t'i^^x•&':>xii^m(r>mW■^L 
[0052]!111[± /^^y^-<M^^'^wo-nAr 

y^)-m^ ( rcj mm ^m^^'t^m^oyn^ix\m 

Vi^^l 60A^b**&*:h.fc'r-5rgSl1 70'-^-fl> 
f— ^2 Ai5j:£/7^— ^ B 1 2 S ey K (8 >i 1 a fc: 0^ 

(j!Bll:S§) 310 320 330 3 40 35 

0 3 60 370 3 8 0{Z^^^h^o mZTTit^- 

diz 1 2 8lf httSr'atCig^.^tKJ-f J£ci:>^>a±to©e 
[03 ^J^lS9-^3 [0] ) li. ApS:||3 1 0(c$e^ 
[ 1 ] ts^xsr-'$t S IM it tm^3 2 01=M^* 

;H.. ^fc. 6lf y h ■ "t—^M "t—^A 

[2] ^^J:t/T^-^B [2] it. iJD#lf 3 3 OiCSS^S 
^1-. Sfc. B4CD1 6!f 1- ■ X— f^-^A 

[3] fccfctfT^— ^TB [3] It. j5PgS3 4 0lcS^^ 
^fc, ||5®1 et: y h ' -r-'S'li. t-^ A 

[4] J5J:t;x-5^B [4] 1S\W^3b0lz^^^ 
:h,.^fc,. S6<01 6 t* 0/ K if=— asii v^— 5 A 

[5] SJ^O^^-^'B [5] Ji. ^J]S:^3 6QfC|g^$ 
^7 <0 1 6 e y h- T^-^'iS. A 

[6] ^J:t5-r--l?B [6] iDffS3 7 0fC{e'^^ 



C13) 



**r3g 2 0 0 1 - 2 3 6 4 9 © 



>"bvXd?3 1 2 322 33 2 342 35 
2 362 37 2 3 8 2 (CfBII^n^o -^fC C<^> 
$g^f^ I P P<DZ(DmjS,X*ltZ'DO 1 6 t*-:' h ■ "f- 

>'Ui/X^i3 1 6 326 336 346 35 

6 3 6 6 3 7 6 3 8 6 (^ztBii^ tl ^ r f^^lif^iflB^ 
113 18 338 358 3 7 S'^AP^^^'i), 
bC^AP^<^>3g^(**0#i§3 2 8 3 6 a-e^Bi:^^.^. 
;J;(C JlEJjrS 3 2 8 &<^>^ttt* .„ JtH]#§| 3 6 8 <D^St 
^^bMg.^3 8 8'T?^#$;ix^. ^^fz.. ^#113 8 8:6^ 
?>»ti5jti;5<^^5§3 5 9 t?^^^iK.-&. Sfeic. JngigS 

a t?*DS:^ti.^c iD#M3 4 8 0ffi:^lf*^^S3 4 9 t: 

0®2O«D1 2 8 fc? :V hfSEtl**i«LS'r^<»(=fci:a 8 

[0 0 5 3] 1^121^ *y K2^:/V-iPi[SSffi« 

SJcfct^-r— B 1 2 8 e y h ( 8 >: 1 6 ^ hO ^ 

iS) 310 320 330 340 350 36 
0 3 7 0 3 8 OfC^|t*ft^=K5o 2 0iDMft§A:ftT 

x""^?ISi:2^^!gi:^/s-K'i?x7fc^^^o 2x<0 

^^t-Sc Il2®:^i£^i,. #i:)-^^^?JU-??2f^-^f&tS 
^aili^/\-F'>:3:Tf=#^^c Cl(Dii^%,. MAC 

^Sfia^ 4:?^[S]3t5iJ:&^-e. 

[0 0 5 4] m CD:S'^[cj:;Kf^.. 1 2 S y Ktil:*!® 
K"X-:Sii§, T^-d^A [0] B [0] 

T^-^li. T-^'A [11 ti^n'f'-^B [1] (i, 

Jf3Di:S3 2 0fC3g^^;h. Stc, S3(»1 6 e h " ^ 
-^H T-5JA [2] fe^rfT^-^B [2] l± 5D» 
||3 3QfC^^$:h, ^fc. S4®1 6£f 'yK 7=-^3 

If T-^A [3] *5cJ:t;7^-$f B [3] (^.. 



4 0|C|g^*5ix.. ^6©1 6Ef y h ■ "f-^m. 

x-^rA [4] SSJ^UJ-T-^B [4] \t. jbn»||3 5 0 

^A [5] ^3J;t;x-5zB [5] ^)f]#il3 5 Oi=^ 
^Stl. S7(»1 etfyh t'-^iII. f'-^A 

[6] t^^XS9-^B [61 iJ. jte^5S3 7 0(C*g-&* 
ttc.. HSOI 6 e y K ■ T— !S^^. x-^ A 

[7] SScfeO^T^-iSiB [7] li, jtaS^lSSOfC-ffl-^^ 

i:^fcfj:Ml:<X)JKIIf*/^<:^^ <> bv:^^ 3 i 2 

322 332 342 352 36 2 372 3 

iP^l^ttJS-ef*. S-T-iSr|5i:2S^!§i:3b<#1^-^>7^ 
i?/\-K'j73:T!=#^S»^^. Pli:2:>ffl«^liliMA 
can y ha>;&>^^^t^b^TA^)7^— 

tlv /U^^^-f > ^i>7.^3 1 B 3 26 3 

36 346 355 366 376 3 8 6(^1511 

^*i-Sf-<^)^iiApS:S3 1 8 338 ass 37s 

^^tm^il^o ^XIZ. mW^^3 1 8 3 3 8 3 5 8 
3 7 85&^^>(D*fhA<^g:^3 1 9 33 9 359 3 

7g-i:Fm#^*i^o ^(ommom^it A;^^^-^^ 

^ 1 60</>2O©1 28t: •> h|ltii±^^ilL3ir'&0>f^f= 
[0 0 5 53 01 31^. 4-:><Z)<«J3gA B 0 0(1 

-5?^BI1 7 0'^.(»7^-^^Afe<^;l;f^•"^^B 12Se 
•> h CS X 1 etf^y h) x-^'SA^CDS-^^ir^^-Jt 

Hmn^ymm (.%uw^) 310 320 33 

0 340 350 360 370 3 3 Oi=-K*p^ 

f*-^?A 10 J iij^T/T^-iSiB [0] it. iuSils 

1 OlCg^^tl... ^2 01 6 * 
x-!S?A [1] ^^XS^-^B [1] f^, ^0^:^3 2 0 
fCig^^ia... ^fc.. S3ffl 1 6 e h x-^S X— 
^A [2] ^J;t;x-^EB [2] it. m^msOizm 

^401 8 e -v h ■ x-^ ^. T-^i A 

[3] ^j:u:x-'$« B [3] tm^3 ^oizm-^-^ 
mtz.. mB<^>^ 6)zyb '^-^m f-^A 

[4] fe^fctKx-^S [4] tt. iPffS3 5 0fCje^$ 

[6] i3ci:T5T^-^B [5] f^. ^fflSIS3 6 0(zJS^^ 

41. *fc, m8(?)1 6^':^ h " x-^!I. x-^A 



(14) 
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[7] ^^U7~^B [7] it. *[lff^3 3 0(CM^^ 
312 322 332 342 352 352 3 

72 S82iztm^^^. zommt. IPP 

•^4 ;U \ K 0 X T f::^ ^ ^) ti 0 ZOmC 

^tr U^>X^3 1 6 3 2 5 3 

35 346 356 366 376 3S6 filStI 

^ii^'tomimmm ^ s 32s ass 34 

8 353 368 373 3 S SX^MW-^M-^^ 

[0056] HSic, m^3omMAyBycyD^ay 

^(DSlf^^^ll3 2 4^^b<X)«7!&y.,. 

2ii*pwiS3 1 somw-^MzMfr^^^. mnm^ 2 
si*Pl:;^coA:^)^ ■?;u^::^L'^?l;^3 2 5 3 2 a^ti^b^ 

2 S^ili^^^&Sfcl^252i7)^BlSi§3 3 8;&^b^ 
^(S. inai^s 1 8 3 2 SfD^fi^Sl^S 14 3 2 
(E mi 2) T?f*.. Jto|H&3 1 8 3 2a<0Sittt.. 

10 0 5 7] mm-. i\mmns as 34 stmms 

^t3 5 8 3 6 8i:jtal:tF^3 7 3 3 8Stl±lil^ 
aO^^lCfi. ^SS 348 368 368 f^APS^^ 

3 2 s ^PI^fr#^mmtB7]^^!S1-'So 

[0 0 5 8] /^^.7^-f^^':3-M4^^U-#^ (c) 

gy-TK- Kl-§fcA6(cfi T;u^::^L'i7-t^3 19 3 3 

9 359 3 7 9fi. ;[tDS:^3 1 8 3 3 8 3 5 

8 3 7 s±^Bmoymnmt^^om<om»M^tmt^ 

^9fwS#?*^t-&o 7;U?=:^^L/^it3 2 5 3 2 9[* 
^]P^||3 2 8?!)^to3:g3 1 8 3 3 8 CO^SS^in^ t T 
g|!J^>4a(DSS:|§3 1 4 324 334 34 4?&'^ 

L"5'-y-3 5 5 3 6 9(^.. Jp^gS 6 8 A^M^CT) 4i@C?> 
SS^^3 5 4 3 64 37 4 3 8 475^&a)^l+^* 
-^-^^aC^iliR^^n^o ^P#^32S 3 58IC 

ijitSc:tibO[>2-p<D^tt3&? 3^^inS/;SSa^^*ff 
aii|*<X)i)PS«S3 4S 3 9 0[c3&b*i§o APS:i93 4 



is#S3 4 8 3 9 oc)^^t^^B.W-(r)tz^lzMS^^ 

3 S 8 3 9 2 fZ^Jh/^^ ]g e>:n 3 3 S 3 

9 2[itli:^3(?)S^^*^ji£t-^. 

[0 0 5 91 *-a^y'J-#^ (B) $tJ"7K-ht-'&fc 

/i^y34^1^'0-m4y^J-mM icy <h± 

T^T;[/5^::^'b^-9-m?5^^5f^ii^= ifaS:^3 4S(i 

8 li^S^m^^-r taS^s 9 2«)m:^if*^r::*S. 

Co 0 6 0] HI 1 4(i., mi^Rl^f^-^^^/^-^T- 
J:t;«#S|fl!>ft*>yU. f-*AA:i3*5cfctfT-^BA 

/ftii:/^g::?p*>'S?8 1 0 320 83o 84 

0 850 8 60 870 Q B Oti^ ^ih^'t hn^i' 

miR^ii^. ill 4cD-^(»MAC^-^y [-onrntm 

2 S|c^$tlTL^S« !&MACiL,:::-y hii 2 OOA* 

D_ i n p C_i n p^C •:>l^t■/^■<7f^■<Mbm-^t 
4^)imU^^Wl^'^flv:itt<Xt^. D„i np* 
C_ i n pCDfc^y(zD_ j n p + C_ i n p^t=r*D 

X. ^]Dg/M#iL= h 3 1 oit^mU^^ 1 4 

(rgjb^h^o -^Jl^^Z^ly^-^Q 1 0!^ilP^^/m^^3 

1 Oils:i3*fc«i^ffl&3 1 4EU:*i*S#^+S. ma en 

^iiA h£3Ci#M<7)fe1!::(*. (AND^—^? 1 oic^ziX 

^ {t>._l np */^-fy'-C_i np) tt^WLX. $ 
SilPSSS 1 8«z4x,e>tl^ACC_J n p&iia-f 

1 2li(»^t4jSfi*i^of:g36<^mftfcJ^l^iS»®m=!:^I- 
^S(Si1"'Sfc^6fcMl\b+i^. 5^^m±®AisJD^- 

h710 720 730 740 ^Z+X-CDlg 

f:>6^AP#^*i'&-^#;6^if^j6>gSJffiif ^2 8ic^t 

ct 3 fC, 3-11011:^5 1 2 61 4 6 1 6A^^#MA 

llliS 1 Bit. ■^JVTZfl^^V-e 1 8^i^LT. (^i^<0 
A:fil*ACC_j n p-(?fe5) A7I^l.ri^30<?)5^ 

Of>^^mm'^Ztt<'r^t^. ACC_.i npfwOl^r 
(D^^ax— ^ (Z)g4F!<D+i--f Jl'"??!^ R N D_A D D36^ 

mn^^tzx±x&^<^x&^^ 

10 0 6 21 ^abtmntit^mm'r-^mmtm^. 



(15) 



#112 0 0 1 -23 6496 



3 1 OOM'^l'^^^^tziT—^tAAti^^U'f—^ B 
I^-S 1 0 6 20 (112 8) T?^Eifc*il§o »H(iPW 
2 0<5>dj:^llS^S^8 1 8 828 838 84STf 

ISS^M T.. (jfet&A±lliACC_ i n pT^fc^) A 

tit tr(»3o© 5*fO^'4ii>^Sfcii5iii6ffic(>/%— 3?rL 
-i' hl:RND__ ADDSSIRf ^C^A^-Tr^-So AC 

[0 0 6 3] im 4*5J:{;@1 5Xit.. ^B/iP^^/M 
to l|1(0tH-r^;PTflJ. :^8liOMAC-v^7)Tr-$A 

xti\tm-](DB^-^m^tt\ ^-^-(^jvm^.. MA 

ci::-^^§c?>fc^LNbh^A:*]T^-^ii0^r*„ ^fc 1 
iSi^:? h^^K-S, ^8fl<DMAC'^©'r-'-SBA:*)l*[^ 

<ju^ t<mm iih.^ <ox.. 3 oo>^^m^<^^ b *i s . 

CfDJz^f^^^. #MACf::x'-5E^Q>5iilv- 

<r>x$lS«&-r^o Si<o:3'<r;i.5i^aco(*lBii^--f 

i?;l^©H(C±'rflE>MAClcKb*:5^+l.'S. GifillSif- 
-l''i?;E'<^>FF5l=^t'<^^MAC[Ctf ^fz. G2f* 

B3-tt-r<7>KDPa1(=^rtDMACfc[rb*;J)^tl.^o 1^3 
C0-9-'f^;n:?f^.. MACJ--i^hi^. iELt^diiJ^mS 

[0 0 6 5] JflCPfig^M^L-T. y\-K^x7f:8x 



til:^3A^^?>*i'&« ccDJi^*. #MAC(i. 

-6. lill 71*. *t^*^Tasa^^*^T3^DfCl0||36LA:^x 

mi i^^n%^m%mt^zx 

XO X7S^t;» g1 (0T-$BA^tex-5iMX 
2 X9^^i\ ^rcDS*IS!c^S&:^*i^M 
ieD«Si[liC0T?&^o m2 0T-^i AA^iii. :&ici 
US' 7 h^3K t=S 1 (DT-^ A A±^ft+>^ X 1 
XBTrte^o ^2(0T-'?BA^i(*Pi:.8 T-5'STffc 
^ . MiEGi (i^ 2 1?-^ ^ ;u"e^r (cfftii^^^n 

um\z\t. I p pi^^*:c?>ft^^fTao m^oMAc 

. CO* (X0^X2) +2 *Ci*Xil|2a>MAC 
T?. Co* (X1 + X3) +2*Cl*X2*if, a^ULV 

CO-FO 

Cl = 0 5*F1 

f*. -T-^iAAi^fiT-^axB Xi5S-4;^.b4i 
§0 T-^BA3f3liXTO Xt7*-^^^*i-. 

[0 0 6 6] a 1 sfi. 5ij;7^;i^^iai1^^fT5<2>(--T— 

S^7^;ui5J;af^^($^^(5l«'t?fe^. mm^m:\zt 
-Sfcfi.. x-^ i:«a^£B:^3l25iii:f^± r 8 x 1 6 5/ 

1^111 0A^e>111 3<C^1-M5ywiACtss^®^8S 
4' 1 1h<^?;H::N0<?)>^UM®zy ^ 1 Ie1<&t— 5^ ^ 

>v ^ (0 U ^ (= <i; o • t iii^ $ *l . 1 4^ Jii± CDS^ y 

& y ... > s £&ii^ fr 1^ 1 mw. y X 8 if -f ^? }\^x y . 
^ 5 1? 3£C C^Ji^ic (i 0 a . 



06) 



ttiaaOO 1 -2364 9 6 



<D^^3&^|g^||3 1 0 320 330 340 35 

0 360 370 asceitS^^^^t- ::ti^> 

^DS^IiUvX^Z 3 1 2 322 332 342 3 
52 362. 372 3 B 2\Ztm^H^o :k\Z jE 

oX^gC^S 1 4 324 334 344 354 
3 64 3 74 38 4r^:^'77.^^^fzii^if7.^ 
^tmW-^H^o ^Omt^,. L'VX^ 3 16 3 2 

6 336 346 356 36S 376 386 
iztm^tltziDt,., iP#lf3 18 328 358 3 

7 8 •(?'&tt$5fl.*c Zil^^<D^n\tm^^3 2S 3 4 

s 3 QB'V'^tf'^Jn^o ^^iz, mwme4s<j>'^m^ 
m^s 4 9^^ms:^*i^o. mi^mi^<j>M\izMix 

1 2^^ h$^fc/hlici:i^ la^DL^^^^ii3tla^>/1^ 

Q*fcf*4 0(D^^5a[A|^'g3fi^:6^fe*Av^*^fz1So 

[0 0 6 9] Q^itm^izmm^it^ii^.. ir-ommitn 

2nm(DW.^ (0 5 0 0 5) iScfc 
(0 1 0) $3#-::>2^i3^r y ^v^jL-^ir^o 

CO 1 0) ^^if CO 5 0 0 5) iw-om 

(D^uimm \z^m ^^xzf ^vnu 1 7 ji^uu ti±i& 

[0 0 7 0] ^m'^h2^^^^:^ff^^^t^^tit. y 
y:^V-:>:fV:y^'oj:vfd:Jiy-\f><?^'i^i^x. 1 00 



[007 1] r=t"^^ <7m^^m(om/<:^<J:^^^ti^m2 
2bic^$tiriA^„ iia-fe/iKi^fcoLATii. i^gSt {0 

2 5 0 5 0 2 5) tS^lf CO 1 0) ^WO 

3'^'i^jia>^X. 6m(DT—^ ■ ;>«^U^^y 1 40 

<D7=-5^ --?t^ys5s.cf^fc30©m/^y^»yi*< 

10 0 7 2] P2 3i*.. ^x— h, ^7^^7.(0 1 P 

paffi$iT^f=to©A:^j^-^iD9^ifc$^-r. 

;&<mL^'3*i^<, 'i?x-::fU"> haw* @|8feJ:t;ill1 

0^!)^f,^1 3(C^Lfc3£5^JSMACpJie^/c(i^1 4^3 

t;^ 1 5 ©cfc y nmt ^ ^tc/<—v 3 m t?'i? 

[0 0 7 3]ji2 4fi ^J:-:^\>yhmf^mP<::<n 

5 fz^zji =E: y H t> A >b'r-^ ^ Av^^ 

•r. ^J^^xte.. 2x7-:/:^-y-:^::^*y>^? 2x^f^>-y- 

x-^^ x-^c?)A:^#5EtjbJ^i^:^^-^7^j:i\ ^^1^1^.. 

^8a<»MACa.- K^fcl^m 4fc^J;tfil1 5JC^ 
L t=4'(iiDMACiL- ;y h•^c4.;^^);h^. S iiil^l J?!^ 
t^(r>{ZN'h^-&-(<7Jl'±f^t'^^. fztzl, 

^^;uf-N[3i(Dj*^ys6^y ^ imc&x-^ ■ -?*^ys5i 
^^m^izmm-t^^ mmz.. <<?hji^iu^imi^x2 

[0 0 7 4] l]2 5fi. ^f/^X#5t(?>f^S=l1^-<>^ia 

(I DCT) a> I p p^^=^^tc Mlzmt^viz. fr 



(17) 



^m20 0 1-236496 



fcy 6 4Il©Slf$fTLV, C I PPa)aMAC^^:[i4 
CDCT) 0] PPib^^^t". fr/tXI DC 7 

2|g<DSS*fc(i6 4ifiicoMS:x*^5g*it^^&o *fj* 
8:*iiS:^S;*«#^-S4^>U- Win sf^^fiJ^wtB 

lC4-9--l'^7AA^^^S. M^f^. ^1 4fe^^;il1 5\Z 

m^x) mmmntx\^m<fjiimx^.. ^BM^mmm^ 

[0075II127!*. mm-A^-^^^m.^-^ (S 

I MD) © I DC TO 1 PPilf^^^-r. ^M^^'kf^'P 



-K^x7^gil?(*., ^1S0<D4+^^-i?;U«)l1iw.. 1 
<D4'^-^^Jia)m\z m04 X 4 V h'J^Xf^5^t?)4jt 

^je^MtDmW^ffoOr:. ai^O) 1 6 i P POS 
MAC/<— V3>"??1 6-»f-1'^?Jt.A\?&^y , ^fc.. IPP 
©4MAC/^— p3>T?32-t#-'f ^if^b© 

I p p•'^?ML^f>tl^a c£^>:^^fi /N-K^xT^i 
[0 0 7 7] ^rtT^tf^f^,, ^.x K$;Kt';u-::^i^"Cff 

^-^^f* Jl-:^^^mimmt^zt\z^'3xum^ii^ 

mt-mifzttiz&c^x^ao. mmut. mmo) 
^ L*.a^*^^?n b (Dmm<Dm t-^itzt^izmz^^x 

[007 8] 
[S4] 



(18) 



#Brl2 0 0 1-236496 



dptr = dptr_init; /* ^^y^^l^M . */ 
cptr = cptr_init; 
optt = optr_init; 



for fil*0? iK'lpletitf; { 

for (i2»0; i2<*ac2an(l; i2++l [ 
for li3=0; i3<*ap3«nd; 13*-+) { 
for [iif'-Oi i4<=lp4endf i4++J { 

xlO 7) = dpcr[0..?:; 

/* or dptrCO], dpfciCClJ, ^trE0,l,2,3l */ 
yCO 7] « qptrCD..7]; 

or cptr[0], cpfcrlO.Ll, 

if (inicialize^accl 

accti4*-accii\CKieI £0 7] - nKijadd{0 . 7f ; 

/* V 

acG[±4*'acc2n£x3e] ro 7] x[a ,7] «p y[0 7]; 

/* WMV V 

if fwrite_back} 

cjptf +- , ; 
cptr . i 



= i 
} 

J 

) 

[0 0 7 9] mU^^Mit (rnltla[ize__acc) 

^zaGc_loop_leven^„ \ ^ i 4fc^cl:t/i 3^ 

i 4 i s^cfcaCi 2j&<-f;5, 

;5^^iib<^)^^fil::^LTf X h^tirSMU (write 

^^■h'^W^if-^ > * cptr <!: fcb ^] Tf^-f > 

[0 0 8 0] .±KPLf=^ftl=i-.K'ei3:. -^<5p«-'^«iS. 

^ (write_paranteter5) ^'%-(:^mzm.'^^h^f^=7 

ode (M-xfii^m*ii) ^op {m-ymw-ymM/-^ 



[112] x-fv^iMi-§-3fP-b'i/+J^ nTi*^BjcDff 

[05] p:fp-l2 2Fi:^i^(3)^^ u ■ :?ay^t3!i<vx 
T=*A^fl^Jigt^DSP<t I ppp::/a-tr :/-y-i:<K>^tJ)S!f 

mi} g]6r=^Uf-i;«JtWll3 PP/%-FOxT ' 
[H 8 ] 8 licoa^LM A G ^ft-:) I p p 5^— jf SiSr- 
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5 Reconfiguiable S1MI> Coprotessoi AitMtecture for Sum of Absolute Differences and 
Symmetric FHteiiiig(ScalabIe MAC Eu^b& ioi linage Fr^^^smg) 

Field of the luyeatiim 

This in^eation relates in general to signal pi'ocessing and moi'e specifically to Single 
1 0 IjijlTuctiOJi Multiple Data (SIMD) copiccessoi arcMtcctuies providing foe faster image and 
video signal processing, mcludmg one and two dimensional fflteiing, tiansfoiixis, and otiiei 
conimon tasks 

Bad^roimd of Hie Incvention 
1 5 A pi'Oblem wbicli has ariseEi in image processing techaology is that Lwo-dimensiocal (2-D) 
filtering has a different addressing patteni tiian one diincasional (l-D) filtciing Previous DSP 
processors and co^jrocessois, designed foi I-D, may iiavc to be modified to process 2 D video 
signals The end desirei goal is to enable a dij|ital signal pioc?cssoi (DSP) or coproccssoi to 
perfbnn image and video processing expediently la snage piocessing, the most useful 
sao operation is 1-D and fHici'mg, wbicii requires addressing the 2-D data and 1-D or 2-D 
convolution coefficients When tbe convolution coeffidenta are symmetiical, aicbitccturc that 
mafces use of the symiaetiy can reduce computation time rougbly in half Ihs primary 
boiiienecK identified for most video encoding algoritiims is tliai ot monon esiiniaiion I he 
problem of mation estimation may be addressed by first convolvii3g an kcage widi a kernel to 
25 reduce ll into lower lesolulion images These images are then recgnvolved with the same 
kernel to produce even lower resolution images The sum of absolute diiTeiences may then be 
computed within a search window at esch level to deieimine ihe best matchlDg subimage for a 
sufaimage in the previous frame Once the best match is found at lower resolution, the seaich 
h repeated within the corresponding neighborhood ai higbei resolutions In view of the above, 
50 a need to produee an arcMeeture capable of peifoiming the I-D/2-D tilteiang, pieferabiy 
symmetricai filteiing as well, aod the sum of absolute differences with eepal efficiency has 
been generated Pieviously, specialized hardware cs: general purpose DSPs weiB used to 
peifonn the operations of summing of absolute differences and symmetric filtering m SIMD 
copiocessor ardiitectutes.. Intel^s MM.K tectnoiogy is similar in eoaccpt although iruicb more 
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5 general purpose. Copeiiding applicaiiom filed on February 4, 199S, dried "Miilti-Muldply- 
Accinniilate (Mulri-MAC) copioc^soi sicMteeture", Serial No 68/C73,668{TI -26868) and 
^DSP with Efficiently Connected Haidware Coprocessor", Seiial No 60/073,641{TI-26867) 
embody iiost processor/copiocessor imerface and efficient Finite Impulse R^poaseypast 
Fouiiei Iiaasfoim (FIR/FFI) filteimg implementations Hxst Ihis invention is extending to 
1 E> seveisl othei functions 

Summary of tb« Invention 
The proposed arcMtectuiB is integiated onto a Digital Signal Processor (DSP) as a 

copiooessor to assist in t3ie computation of sum of absolute differences, symmetiical 

15 row/cGluim Finite [mpulse Response (FBI) fiiteiing witii a dowasan^^iiDg (oi npsampiins) 
c^on, row/toltimn Discrete Cosine I lansfonn (DCr)/Inverse Discrete CasiiK TransfornL 
(IDC1), and generic algebraic functioas The aidbitectuic is called IPP, which stands for 
inaage p-ocesaing peripheral, and consists oi S njnltipiy-accranulate hardwaie units connected 
in parallel and routed a2id muJiiplexed togettier Ihe atchiiiecmre can he dependent upon a 

20 Diiect Memoiy Access (DMA) cona'oDer to tetiieve and write back data fiom/to DSP memoiy 
without inteiYention fiom the DSP core The DSP can set up the DMA tiansfer and IPP/DMA 
syncfaronizadon in advance, then go on its own processing task Alternatively, the DSP can 
perform the data transfers and syncluonization itself by synchi'onizing with the IP? architecture 
on these transfers This architecture implements 2-D filteuiag, symmetrical filtering, shoit 

25 filters, sum of absolute differences^ and mosaic decoding more ef^iciendy th?^n the previonsiy 
disclosed Multi-MAC coprocessor architectiire( 1 1 26868, Serial No 60/073,641, dtled 
"Recoafiguiable Multiple Multiply- Accumulate Hardware Co-Processor Umt'\ filed on 
January 4 199 S and incorporated herein by reference) This coprocessor will £ready 
accelerate the DSP's capacity to perfonn specifically common 2-D signal pr-ocessing tasks 

30 The aichitectuie is also scalable providing an integer speed up in performance for each 

additional Single Instcuction Multiple Data (SIMD) block added to the architecture (provided 
the DMA can handle data tr ansfers amoug the DSP and the coprocessors at a rapid enough 
i2Xt) This KcJmoIogy could greatly acccleiaie video encoding. This archJtftcmre may be 
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5 jutegiiated onto existiiig DSPs sudi as the Texas Inslrunients TMS320C54x and IMS320C6jc 
Eacb of these processois already contains a DMA «3atiioller for data ^IaIls^ers 

Brief Description of tlie Diawings 

The accompanymg di a wings, which are incorporated in and constitute a part of the 
10 specification^ sdiecnaticaily illustrate apicfexred embodimenJ of the invention and, together 
with the genei-al desciiptian given above and the detailed descripcioc of the prefeiied 
embodiment given below, seive to explain the principles of the invention Ihese and other 
aspects oi this invention are illustrated in the drawings, in which: 

Figure 1 illustrates £he combination of a digital signal pioccssor core and a 
15 rsconiigurable hard-ware coprocessor m accordance with this invention, with the copiocessoi 
closely coupled to the internal bus of the DSP 

Figure 2 illustrates the memoiy map iogical coupling between the digital signal 
pi'ocessor core and the reconfignrable haidwate co-pwcessoi of this invention; 

Figrne 3 ilhistiates a manner of using the reconfigurable IPP hardware co-processor of 
20 this invention; 

Figure 4 illnsttaces an alter native embodiment of the combination of Fig 1 including 
two co-pi'ocessois with a piivate hns in between; 

Figure 5 illnstrates an alternate connection between DSP and the IP? copTO£Xssoi, 
wheie the coprocessor and its memory blocks foim a subsystem which is loosely connected to 
25 DSP on a system bus 

Figure 6 illustEates the IPP overall block diagram architecture accordmg to a preferred 
embodiment of . the invention 

Figure 7 illnstrates the input fommtter of the reconfigurable IPP haidware co-processor 
iliustiated in Figur e 6 

30 Figure 8 illustrates a schematic diagiam of the IFF Datapath Architecture A, with 8 

independent MACs 

Figrne 9 illustrates the outpni formatter of the reconiigurable IPP hardware co- 
processor illustrated in Figure 6 



11-27 1 77- 3 



(31) 



<$gg 2 0 0 1 - 2 3 e 4 9 6 



5 figure 10 illustrates a diagram of the IPP datapatt aichitecture B of one alteniative 

adder coiiiiguratioji of the addei i?ortioTi of the iPP, the single 8-tces adder, accoiding to a 
prefeiied embodimeni. 

figure 11 illustiates a diagram of the IPP datapath aichitecture C of anothei alternative 
addei configurations of the addei portion of the DPP, dual 4-tiees with butteifly, according to a 
10 preferred eoibodhnent 

Figure 12 lllustrHtes a diagiam of the IPP datapath architecture D of soiothei' alternative 
adder coafigqiation of the adder portion of the IPP. quad-2 tisss, accwding to a piefeiied 
embodiment 

Kguie 13 illustrates a diapam of the IPP teconfiguiable datapath architecture that 
1 5 includes routing and multiplexing oecessaiy to suppoct the A/B/C/D configurations sho^n in 
Figures S, 10, U, and 12 

Figiire 14 illustrates a diagram of a siuiplifiBd version of the IPP reoonfigurable 
datapath architecture, which support the previous A and D version without Frc^Add (Figures 
8 and 12). 

30 Figuie 15 illustrates a diagiam ol another simplified veision of the IPP datapath 

architectuie which only has 4 MACs and supports oaly the previous A version without Pre- 
Add 

Figure 16 iilusti ates the rcfoiTCtatting of the input coefficients to the Datapath block 
jiecessary to perform a 3-tap FIE. ROW filtering acootding to a preferred embodiment of the 
25 invention. 

Figure 17 illustrates the reformatting of the input coefficients to the Datapath block 
necessary to perform a 3 tap sytnmctiic FIR ROW filtermg according to a preferred 
eEnbcdimeiit of the inveution 

Figure 18 illustiates ftom v^ere, in the memory, the input coefficients are read and 
30 whereto the output coefficients aie wiitEen, necessary to pe^fom a 3-tap FIR column filtering 
according to a preferred embodimetLt of the invention 

Figure 19 illustrates a scheruatic of the data path block wtth a tree addei when the IPP 
is peifo^ming a sum of absoEuie differences operation accordmg to a preferred embodmient of 
the invention 
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5 Figpre 20 illustiates the lesser density of the Red aaad Blue colors veisus tlie Gieea 

color involved m a demosaic operation 

¥lgaie 21 iJIustiates the Tefoimattir^ of the data necessaiy to petform a E.OW pass 
portion oi die demosaic operation according to a preferred embodirfleat of the invention 

Figtire 22 iHustiates the reformatting of the data necessaiy to perform a COLUMH pass 
10 poitioiL of the demosaic operation accoi ding to a prefened embodiment of the mventton 

Figure 23 illustrates the reformatting of the input data necessary to peiform row- wise 
wavelets tiansfoim, siniilar to symmetric ROW IHtering, accoidmg to apiefexjed erabodiment 
of the inveotlon 

Fig^Te 24 illustrates die reformatting of the input data necessary to pet form colunm- 
15 wise wavelets Iransfoim, shnilai to cohnnn filtering, accoK^g to a prefened embodiment of 
the inventian 

Figure 25 illnstratcs the post-inultiplicr adders of a split adder tree -ftith butterfly 
configuration (C, Figure U) necessary to implement tlie cross additions and subtractions of t!ie 
row' wise irvcist: Discrete Cosine TransibcmdDCT) 

30 Figure 26 illustrates the pi c rnult^jly adders oi a split adder tree with buTCcifly 

configuration (C, Figmc 11) with the butterfly disable necessary to kr^kancnt the aoss 
additions and siibtiactions of ibe low-wise Disccete Cosine Ir3msfoiin(DCI). 

Figure 27 iHustiates the column-wise IDCT and DCT iinplcracnted iii SIMD mode of 
opec3.tion, similar to the column FIR tillering 

25 Figure 28 illustrates two of the 8 MAC units of Figure 14 In a more detailed drawing of 
components 



Detailed Descilptioii of the Preffiiicd Embodtmeats 

Figure I iiiustiaces ch-cuit 100 including digi^ signal processoi core 110 and a 
30 reconti^urabLe IPF hardware co-processor 140 Figure 1 is the same figuie 1 as in co-pending 
apphcation Serial No 60/073,641, tided "Reconfigurabie Multiple MuJdply- Accumulate 
Hardware Co-piocessor Unit" assigned to the same assignee, tl^ co-piocessor of which a 
preferr-sd embodimeni: of [his iavencion is made In accordance with a preferred embodiment 
of this invention, ±ese parts are formed in a single integrated circuit [IQ. Digital signai 
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processor core HO may be of convention design Tlie IPP is K memoiy mapped peiipberal 
Transfeiiing data between IPFs and DSP's working memory can be carried out via the Direct 
Memoiy Access (DMA) controller 120 without intervention irom the digital signal piocessoi 
core 1 10 Alternativdy, the DSP core 110 can handle daia nansfer itself via direct load/store 
to IPP*s working memory 141 , 145 and 147 A combination of the two transfer mechanisms is 
also possible, as the DMA can handle large data/coefficient transfers more efficieillyj and the 
DSP can dhectly write out short commands to IPP comnmnd memciy 141 moic efficiently 
The xeconfigtuabie IPP hardware co-processor 140 has a wide range of functionality 
and supports symmetricaVasymmmetiical row/colunm filieiing, 2-D filtering, snm of absolute 
differences, rcw/column DCI/IDCI and generic lineai algebraic functions S3anraetiical 
row/column illteriiig is frequently used in up/down sampling to resize images to fit display 
devices Two-dimcna tonal filtering is often used fot dcmosaic and foi image enhancement in 
digital cameias Sum of absoliite differences is implemented m MPEG video encoding and 
H 263 and H 323 ^ encoding standards for the telephone line video conferencing Row/column 
DCT/IDCT is impleraaited in JPEG image encQding/decoding and MPEG video 
encoding/decoding Geneiic lineai algebiaic functions, indndfng aiiay addition/siibtraciion 
and scaling are fiequently used in imaging and video applications to supplement the filtering 
and tiansform opeaiations For exan]^>le, digital cameras require scaling of pixels to implement 
gain control and white b^ancing 

In the prefgiEed embodiment, leconfiguiable IPP hardware cb-processoj 140 can be 
piogiammed to coordinate with direct memory access ciitroit 120 foi anfiooomous data transfer s 
independent of digital signal processoi coie 1 10 External memory interface 130 aeives to 
interface the internal data bus 101 and addie&s im 103 to their external counterparts external 
data bus 131 and external address bus 133, irespectively External memory mtcrfece 130 is 
coovei^ional in construction. Integrated circuit 100 may opdonalty irudude ^didonal 
convention^ featuiea and chtaiitS: Note particularly that the addidon of cache memoiy to 
rategiared circuit IQO could substanttally improve performance . Ihe pajts illustiated in Figure 
1 are not intended to exclude the provision of other conventional parts Those conventional 
parts illustrated in Figure 1 arc merely the parts mo5t effected by the addition of rcconfigurablc 
hardware co-proccssoi 140 
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5 R^xmfigurable IFF bardwai'e co-pioces$or 140 is coupled to oilier par» of integrated 

dicuit 100 via a dto bus 101 and address bns 103 RecoEnfigurable IPP hardwaie co-processoi 
140 includes coiraiiand msmoiy 141, co processoi logic coie 143, daia memoiy 145, aad 
cxiefficfent mejnory 147 Command memorj' 141 serves as tlie conduit by which digital signal 
processor core 1 10 coEtroJs the operations oi lecoirfigui^iijle hardware co-processoi 140 Co- 
10 processor logic core 143 is r^jKinsive to coimnaiid^ stored in command naemoiy 141 wiich 
form B command queue to perfoim co-processkg fonctions Ihese co-proccssing functions 
involve exchange of data between co-processor logic core 143 and data memory 143 and 
coefliciem memoxy 147, Data memory 145 stores tfae input data piocessed by reconfigui'able 
hardware co-piocessor 140 and fuither ^Qr e$ the lesuitant of the operations of leconfiguiable 
15 hardwaie ct?-processor 140 Coefficieut memoiy 147 stores the ujicbanging or relatively 
unclianging process paiauieters cal3ed coefficients used by co-proceasoi' logic core 143 
Though data memoEy 145 and coefficient memory 147 liave been sliown as separ ate parts, it 
would be easy to employ these merely as diffeieai portions of a single* unified memory As 
will t)e siiowa below, for the muitiple multiply accumulate co -pxocessoi desciibed, it is best if 
20 such a single unified memoiy has two read ports for leadirig data a^id coefficients and one 
write port for writbg oiatput data. As muItipIc-poii memoiy takes up more silicon area than 
single-poit memory of the ^ame capacity, the memcay system can be partitioEied to blocks to 
achieve multiple access points With such memory configurarion, it is desirable to equip IPP 
with nnieraorj^ bitration and stallhig mechanisra to deal with memoiy access confiicU It is 
25 believed best that the memory accessible by reconfiguiable IPP iiardware co-processor 140 be 
located on the same integrated circuit in physical pioximky to co processor logic core 143 
This physical closeness is needed to accommodate the wide memory buses r equired by the 
desired data throughput of co-processor logic core 143 

Pigui-e 2 illusttates ihe memory mapped inteifece between digital signal processor iX^^e 
30 i ID and recotifiguiable IFP haxdvrare coprocessoi 140,, Digitat signal processor coie 1 10 
consols reconfigurable EPP hardware copixjcessoi 140 via command memoiy 141 In the 
preferred embodiment, command memory 141 is a first-in-first-out (FIFO) memoiy with a 
command qitcuc The write port of c omm and memory 141 is memory mapped into a single 
fliemory location within the address sp2.ce of digital signal processor cor e 110 Thus digital 
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5 signal processor core 110 craitrols recoiifiguiable IPP hardware co-prQcessor 140 by writmg 
connnands to the address seivkg as the input to command memoiy 141. Command memDry 
141 preferably includes two cucuiarly oriented pointers The write pointer 151 points to the 
lacatioa within consmaM memoiy 141 wheiem the next received command is to be stoied 
Each time liiere is a write to tho piedeteimiaed address of command memoiy 141, write 

10 pointer selects the physical location receiving the data. FoUowing such a daia wtite, write 
pointer 151 is updated to point to the next physical location within connnand memory 141 
Write pOMtet 151 is di'culaily oriented in that it wmps around ftom the last physical location 
to the fit St physical location E:econ:6guiable IPP hardware co-processor 140 reads commands 
from connnand memory 141 in the same order as they are received (FIFO) using read pointer 

15 15?. Read pointei 153 points to the physical location with command memoiy 141 stoting the 
next command to be read Read pointei 153 is lipdated to reference the next physical location 
within command memoiy 141 followmg each such read iSTote that read pointei 153 is also 
ciicularly oriented and wiaps aionnd &om the last physical location to the first physical 
location Conmaand memoiy 141 mcludes a feature preventing write pointei 151 from passing 

20 read pointer 153 This may t^ko place, for example, by seinsmg to wjite and sendmg a 

memoiy fault signal back to digital signal processoi coic 1 10 when write pointer 151 and read 
pointes' 153 refeience the satne physical location. Thus the FIFO buffbi of command memoiy 
141 can be full and not accept additional commands 

Many digital signal processing tasks will nse plmal instimces of ^bnilar fiinctions.. For 

25 example, the process may include several filtei tuncdons Recoofignrable IPP bardwaie co- 
processoi 140 piefeiably has sufficient procssing capability to peifonn dl of these filter 
functions in real time 1 he maao store area 149 can be used to store common function in 
fonn of subioutincs so that invoking these functions takes just a "call subroutine'' coimaand in 
the command qnene 141 This leduces traffic on the command memoiy and potentially 

30 memory requiiement on the command memory as a whole Figure 2 illustrates 3 subroutioes 
A, B, and C tesiding on the niacio store area 149. with each snbroatine ending with a "retuin" 
command 

Alternate to the command FIFO/macio store coniination is static command memory 
contents ihar DSP set up initially The command memory can hold muitiple command 
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sequences, eacb eading witb a "^sleep" command.. DSP instructs IPP to execute a particular 
commaixd scqucacc by ^viiting the staiting addfcss of the sequence to an IPP coEtrol regisici 
IPP executes the specified conrmaids, until enconnteriag the sleep coannand, wiiea it goes into 
standby mode waiting for fiirthei icstinction fi-om tJie DSP Data memoiy 145 and coefficient 
memory 147 can both t?e mapped ^itiim the Data address space of digital signal processor core 
110 As iliiistiated m Figure 2, Data lus 101 is bidii ectionally coupled to memory 149. In 
accordance with the altemalive embodiment acted above, both data memory 145 and 
confident meimory 147 are formed as a part of memory 149 Memoiy 149 is also accessible 
by co^procDSfior logic core 143 (not illustrated in Fi.gure 2) Pigure 2 illustrates three 
cSrcumsmbed areas of nxeiuory witMn mcinoiy 149 As v/H} be ruithex destrilbed below, 
E'econfigurable bard^^e co piocessor 140 performs sevetal flincdons employing differ ing 
memory areas.. 

Integrated circuit 100 operates as follows Eithei digital signal piocessot core UO or 
DMA controiler 120 control the data and coefficients lEsed by reconfigurable IPP hardware co- 
prDces,sof 140 by loading data iiito data mcnoory 145 and the coefficieiti:s into cocffidesit 
memory 147 oi» alternatively, both the data and the cocffrciciits into unified miliary 149 
Digital signal processor core liO may be programmed to perform this data transfer directly* gi 
aLfernatively, digital signal processor coie 110 may be programmed lo control DMA codHdHo: 
120 to p^itoim this data transfer Partlcwlaiiy for audio or video processing applications, the 
daja stream is received at a predictable rate and fiom a predictable device Ihm it would be 
typically efficient for digital processor core 110 to control DMA mntr oiler 120 to make 
tiansfers liom external memory to memory accessible by recodfigniable haidware co- 
processor 140 

Following the bansfer of data to be processed, digital signal processor cor e 110 signals 
reconfignrable IPP hardware co processor core 140 with the command fiw the desired signal 
processing aJgoriihm As previously stated, commands are sent to a reconfigurabte IPP . 
haidware co-processor 140 by a memoiy write to a predetermmed address within Command 
Queue 141. Received commands are stoced m Command Queue 341 on a fsr st in-fust-out 
basis Each computational connoaud of reconfigurable IPP co- processor preferable includes a 
manner to specific the particulai fimction to be performed . In the pieferral embodiment, 
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reconfigirrabte tiartlwaie cc^fiocessoi b csonstcTicied to be i-econfigurable, Reccmflgiirabie IPP 
iiardware co-pio<^ssoi lias a set of fiinctioiial units, mdx as mulfejlieis and SKMers, tMt can be 
connected together in differing ways to petf oim difFereat but related functions The set of 
related ftmctioiis selected fot each reconfigurable hardwaie co-pi-oocssor will be based upon a 
siamlaiity of tbe iiiallijcmatics of the iunctioi3s HUs simil^iiy in nmftematics enables similar 
hai'dware to be rcconiigined for the plural fenctions,, Ihe cQinmaiKi may iodKate the 
paiticul^ computatioii via m opcode in the manisei of data processat iostnictions 

Each computational command iiKludes a majoner of specifying the locatiod of the input 
data to be used by the coniputstion Iheie are many stutable methods of designating data 
space For e?campl^, the command may specify a starting address and uumbei of data words or 
samples within the block The data sisse may be specifitsd as a paiameter oi it may be specified 
by the op code dciioing the computation type As a further example, the command may 
specif/ the data size, the staitij^^; address and the ending addiess of tbe mput data Note that 
known indii ect metiiods of speciS'isg where the input data is stored may be i^sed Ihe 
command may include a pointei to a registei or a memory location storing any numbei of these 
paraoijeters such as start address, data size, and number of samples within the Data block and 
end addrsss 

Each CDmjxitational conmand must furtfer indicate the memory address range storing 
the output data of the particular command This indication may be made by any of the 
mdhods listed previously with regard lo the locations storing the ij^ut data. In many cases tbs 
computational ibnction will be a simple fUtei function and the amount of output data foUowmg 
processing "uwll he about equivalent to the amount of input daia In othei cases > the amount of 
output data may be more or less than the amount of input data In any event, the aznount of 
resultant data is known from th^ amount of input Data and the type of computational fiinction 
lequested. Ihus merely specifying the Starting addiess provides sufGcient infotmation to 
indicate where all the resultant data h to be stored It is feasible to store output da^ in a 
destructive manner ovei-writing input data durii^ processmg Altemativciy ^ the outpnt data 
may be vt^iitten to a different ponion of memoiy and the Input data preserved &t least 
ten^jorarily Ihe selection between these alternatives may depend upon whether the mput data 
will be reused 
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5 Figuje 3 Dliistrates one usefiil technique mvolviitg altemativeiy employmg two memoiy 

aieas One memoiy area 145 stores the input data needed for co-processoi' fuuctbn lbs 
lelative^y constant coefficknts aie stored in coefScient memory 147 The mpm data is recalled 
for use by co-processoi logic core 143(1 read) hom a first memoiy area 144 of tiie data 
memory 145 Itie output data is wiiuen into the second memory aiea 146 oi liie data 
10 memoiy{i wiite). Following use of ±e data memoiy ai'ea, direct naemoiy access circuit 120 
writ€3 the data into the fust memoiy area 144 foi the next block, ovciwiiting the data 
previously used. (2 wiits3 At fee same time, du ect mejnoiy access circuit 120 reads data from 
secoTid mcrnoiy area 146 ahead of it being overwritten by reconfiguiahle hardware co- 
piocessor I4D {2 read) Ihese two rnemoiy aieas for input Data and for resultant data could 
15 be configured as circular buffers In a product that leqraes pimai related functions, separate 
memory areas defined as ciicular buffers can be employed, One memoiy area configured as a 
cii'culai buffer will be allocated to each sepaiate function 

The foitnat of computational conmands preferably closely resembles the format ot a 
subroutine call iustnictbiL in a high level language lliat is, the comiuand includes a command 
20 name similai in fiinction to the subioutine name speciiying the paiticulaT conusutational 
function to be perfbimed Each command also includes a set of parameters specifying 
available options within the conunand type. Foi example, Ehe following list of computational 
commands and the various paiameters: 

Row^iiltei(us, ds, length, blcgk, data addr, coef_addi, ofutp _addi) 
25 CoMnmJilter<u3, ds, length, block. data_addi, coef_addi, outp_addO 

Raw_filter_sym(us, ds, len^ block, da£a_addr, coef_addr, ouJp_addi) 
Sum _abs_diff (length, data_ arfdrl, da£a_addr2, oatp._addi) 
Row_DCT{d3ta__addr, oulp_^addi), Row_IDCT, Column^DCTT, Column_„IDCI 
Vector_add(length. data_addrl, data_addT2, oirtp_addr) 
30 These paiamctcis may take the fosm of diiect quantities or variables, which are pointers to 
registei s or memory locations storing the dcsii cd quantities 1 he number and lype of these 
paiameters dej^nds upon the cammand type. This subioutme call format is important ia 
reusij^ p^-ograms written for digital signal processor core 1 10 Upon use, die piogiammcr or 
compiler prtfvides a Stub subroutine to activate reconfiguiable IPP hardware co-piocessor 140 
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Ifais stab subrautine merdy receives the subroutine paiatneters and forms the cotiesponding 
cchprocessor command using tiiese paiameteis The stub subroutine tlien writes this command 
to the predezermined memory addi ess reserved fox command transfers to leconfigiiiable 
hardware co- processor 140 and thea retuias Ihis invention envisions that the compmational 
capacity of digital signal processor cores will increase regularly with time rhiis the 
processing leq^uemenfs of apaiticulai product may reqdic the combination of digital signal 
processor core UO and reconfiguiable IPP haidware co-processor 140 at one poiiLt hi time. At 
a later point in time, the avaiJahle computsticnal capacity of aa iasuiiction set digital signal 
processor core may increase so that the fimctions previously requiring a reconfigurahle IPP 
h.2xdVf33:e co-ptocessor may be performed in sofb?/aie by the digital signal processor coie The 
prior progiajn code for the prodiKt may be ^sily converted to the new, more poweifol digital 
signal processor This is achieved by providing indepcacfent subitiutincs for each of the 
cominands supported by the replaced recoofigurablc hardware co-processor 7 hen each place 
where the original ptogram employs the subTomine stub to tiansmit a command Is leplaced by 
the correspooding subroi^e caH,, Ejcteiisive reprogramming is thus avoided 

Following compMon of piocegging wi one block of data, the data msy be tiansferied 
out of data memory 145. This second transfer can tafce place either by direct action of d^tal 
signal processor core 110 reading the data stored at the output memoiy locations or through the 
sM of direct memory access circuit 120 This output data may represent the output of the 
process In Ms event, tie data is trans&ired to a utilization device. Alteanatively, the output 
data of leconfiguiable IPP hardware co-processor 140 may repiBsent work in progress- In this 
case, the dafe wili typically be temporarily stored in memory exteraai to integrated circuit 100 
for later retrieval and further processing 

Hecoofigurablc IPP hardware co-processor 140 is then ready for farther use This 
further use may be additional piocessing of the same ftinctioa In this case, the process 
described above is repeated on a new block of data in the same way Ihls &ither iise ma^' be 
processing of anoiiier ftmction la this case, the new block of data must be loaded into 
memoiy accessible by reconfigutable IPP hardv^e co-proccssor 140* the new command 
loaded and then the processed data read fbi output or further piocessing 
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Reconfigurable IPP Ijardware co-piocessoi 140 preferably will be able to p^rfonn more 
than one function of the jproduct algoritbis. Ihe advantage of opcratiEg on blocks of data 
rather than discrete saznples will be evident when leconfi^able IPP hardware co-processoi 
140 operates in such a system As an example, siij^se thai leconfigcrable IPP Midwaie co- 
processor 140 peifonns three fiinctions, A, B and C Tliese functions may te sequeiatial oi 
they may be intej leaved with fonctions peifoimed by digital dgnai piocessoi coie 110 
Recorifigui'able IPP haid^vaie c«>-prooessoi 140 first perforins function A on a blocJc of data. 
Ibis functio3i is peifoimed as OTitlined above Digital signai proccsoi coie 1 10 either directly 
or by control of direct memory access circuit 120 loads the input data into data memory 145 
Upon issue ef the command for coitfigmation for fbtnction A which specifies the amount oi 
data to be piocessed, reconfigiirabie IPP hardware co-processor 140 performs fiinctiGn A and 
3toi es the resultant data back into the poition of memory 145 specified by the command A 
sunilai process occurs to cause reconfiguiable IPPhardwaie co-processor 140 to pezfonn 
fiioction B on data stoied m memoiy 145 and retmn the tesnll to memory 145 The 
peifoimance of function A mSLj take place upon Data blocks having a size UTirelaied to the size 
of the Data blocks fbi function B. FmaUy, reconfigm:abla IPP haidware co-piocessoi 140 is 
commanded to perfoim fimction C on data within memory 145, returning the resxiltant to 
memory 145 The block size ioi peifoimJng flmction C is independent of the block sizes 
selected for ftncdons A and B 

Ihe iise&Iness of the block processJng is seen fiom this example. The three fbnctions 
A, B and C will typically perfoim amounts of work related to one commoa data piocessing 
size (ibr escample, one 16 x 16 block of pixels as a final output), thai is iiot nscessaiily «jual in 
actual input/output sizes due to filter hfetoiy and up/down sampling among toctiona 
Provision of special hardware for each funclion will sacrifice the geneiality of jfiinctioaaUiy and 
reusability of reconfigmable haidwaie FiJrther, it would be difficult to match the resources 
granted to esdx flmction in faaidwaje to pt ovide a balance and the best utilization oi the 
haid^iv^ When reconfigurable hardware is used theie is inevitably an ovei'liead cost fot 
switchmg between OMifigmations. Opejadng on a san^le by sample baas fta flow through the 
three ftinctioias would rcqmic a maximmi numbci of such rcconfiguradon svrircbes Ihis 
scenario would clearly be less than optimal Thus operating each function on a block of Data 
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before i-econfiguiatiou to switch between ftincdoiis would reduce this overhead AddMonaliy , 
it would then be leladvely easy to aEocate resources bet^'ecn the functions by selet^g the 
amount of time devoted to eacli faaction lastly, such block processing would generally 
fequixe less control oveiliead from tixe digital signal processoi core than switching between 
functions at a sample level 

Tiie block sizes selected for the vaiious fii3ictioGS A, B and C will depend upon the 
lelative data cates requiied and the data sizes In addition, tbe tasks assigned to digital signal 
processor core 110 and theii lespective computational tequiiements must also be considered 
Ideally, both digital signal piocessoi coxa 110 and lecotifigumble IPP hardwsie co-processor 
140 Vr'-ould he nearly fully loaded This would r esult in optimum use of the resources The 
amonnt of woik that should be assigned to the tPP depends on the speedup factor of the IPP 
co-processoi 140 versus the DSP core 1 10 For example, when the IPP is 4 times fastei than 
the DSV, the optinnim workload is to assign 80% of ih^e work to the IPP, and 20® to the DSP 
to accomplish 5 tijues the total speedup Such balanced loading may only be acMeved 
product algorithms with fixed and known functions aod a stable data rate Iliis stioild be He 
case foi most imaging and video applications Jf the computational load is expected to change 
with time, then it will probably be best to dynanaically allocate computational resources 
between digital signal processoi coie ilO and reconfigurable IPP hardware co-processoi 140 
In this case it is best to keep the functions perf brined by neconfiguiabk IPP hardware co- 
processor 140 relatively stable and only liie functions peiformed by digital sigeal processor 
core 110 would vary ,, 

The commaiKl set of Reconfigurable IPP iiaidwaie co-processor 140 preferably inqli!dc$ 
several non-comiHitational instructions for control ivmiptigris 

Receive data_ synchronization (signal, tme/fal^), or wait_ until _signai 

Send datajynchronization (signal, tme/false), ot assen^signal 

Synchronization completion (sigoai, tm^^se), or assert_signal 

Cali_stibroutme(subroutine_addi) 

SetmnO 

ReseiO 

SleepO 
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5 Write j>ajaiiietDr(paTaineter, value) 

These conti oi fiiEctions will be usefiil in cooperation between digital signa] piocessoi- core 110 
and rccouRgmablt IPP hatdwaie co-processor 140 Tbe first of tliese commands is a 
receive_dara_syiichi'oiii2ati<m conunand. TMs coimnand can also be called a wait until signal 
command Tliis conimand w[[l typically ba used in conjUiiction with data transfers handled by 
10 direct m.emoiy access cii cnit 120 Diptsi signal processor core 110 will control the process by 
setting up the input data transfer through diiecl memoiy access circuit 120- Digital signal 
processor core 210 will send two commands to reconfiguiable IPP hard ware co-pracessor 14Q„ 
I he fu st coiMiand is the l eceive data synchronisation coimnaiid and the second coimnand is 
the computational command desii ed 
15 Reconfigurable IPP hai'dwaie cc^ processor 140 operates on commands stored in the 

cofnmand queue 141 on a. fiist-in-fiigt'^ut basis Upon reaching tlie leceive data 
synchronization cosmand, recooRgurable IPP hardware co-processor will stop 
Reconxigurahle IPP hardware co-piocessoi will remain idle nntil it receives the indicated 
contr ol signal from diieet memoiy access ciicuit 120 indicating completion of the input data 
transfei Note that direct memory access cucuit 120 may be able to handle plmai queued data 
tia&sfers. TIus is known in the art as pluraj DMA channels . In this case, the receive data 
synchronization command must specify the hardware signal corresponding to the DMA chamiel 
used foi input data transfer 

Following the completed receive data synchronization command, reconfiguiable IPP 
haidware co-proceflsoi 140 advances to the next command in Command Queue 141 In this 
case^ this next command is a compwtatioiiai command using ths data just loaded Since this 
computational command cannot stait until the previous receive data synchionisation command 
completes* this assures that the correct data l^s been toaded. 

The combination of the receive data synchronizatioo comaiand and the computaiioaal 
command i educes the control burden on digital signal procearsoi core 110 Digital signal 
pcocessot core 110 need only set up direct memory access ciicoit 120 to make the input data 
transfei and send the pair of commands to leconfigutable BPP hardwaiB co-processor 140 
Urn i?¥0uld assure that the input data ttansftr had completed pt ior to beginning the 
computational operation TMs greatly leduces the amount of software overhead required by 
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the: digital sigmi processoi core 110 to control the function of reconiigurable IPP haidware co- 
processoi 140,, Otheiwise, digital signal processor core 110 may need to receive an interrupt 
Erom direct mcimoiy access circuit 120 signaling the coinpletion of the input data [oad 
operation An interrapt service routine must be initiated to service the interrupt Ir addition, 
such an inteirupt would lequiie a context switch fiom the inteimpted process to the interrupt 
service routine, aod another contest switch to retura from the internipt Consequeatly, the 
receive data synchromzation coraiuand frees up considerable capacity within digital signal 
processor coie for more productive use 

Anoihci noa-cornputatioiial commaGd is a send data synchtorLi?Btioii cominand. The 
$CBd data S3^chiomzatioE command is nearly the inverse o£ the receive data s^f acfajonization 
command, and actnally asserts the signal specifed Upon reaching the send rtof a 
synchionizatian commaad, recoafigurable IPP hardware ca-jwocessoi 140 asserts a signal 
wlUch then iriggen a diiect memoiy access operation This direct memoiy access operation 
reads data from dala memoiy 145 for storage at another system location Ihis direct memoiy 
access operation may he preset by digital signal ^cessojt coie 1 10 and is merefy begun i^on 
I eceipt of a signal from reconfignrablc IPP hardware co-pioc^sot 140 upon encounteiing the 
send data synchronizaticn commartd,, In the case in ivhicb direct memoiy access circuit 120 
STippona ptoal DMA channels, the serKi data synchronizatioii cmmaiid must specify the 
hard^^are signal that would trigger the coiiect DMA channel foi the output data transfer 
AlteniaJlvely, the send data syiK^bronization comniaiid may specify the control parameters for 
dhect memory access ciicuit 120, including the DMA channel if moie than one channfil is 
supported Upon eiia)unteEing such a send data synchronization eocnmand, ^configurable IPP 
hardware co-processor 140 communicatc& direcfly with direct memory access circuit 120 to set 
up and siait an qjpropiiatc diiest memoiy access cpeiation 

Another possible rm-cornputational annmand is a synchronization CGn4)letiail 
cotmmod, actually anotheE applicalion of asseix signal command Upon fincnumaring a 
svQchiomsatian completion command, reconfigurable IPP hardware co-piocessor 140 sends a 
signal to digital signal processor- core 1 10 Upon receiving such a sigiial, digital signal 
processor core 110 is assured that all prior commands sent to reconOgurable IPP hardware co- 
processor 140 lave congjfctcd Depending upon the application, it may be better to sense this 
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5 signal via mtenupt or by DSP core 1 10 polling a hardware status register It may also be 
better to (jneue seveial operations fox leconfigutablc IPF hardware co proc^soi' 140 using 
send aad receive data syncirronization couunasds and tlien mtemipt digital signal processoi 
core 1 10 at the end of the queiLc.. T his may be iiseful for higher leVd contiol functions by 
digital signal processor core 110 following the quei^d operations by reconfigurable IPP 

10 haidware co-piocessoT 140 The IPP also uses the folio wiog other control/synchi oaization 
commands: Sleep; Ejeset; Wrire^paiameier The write^^paiameter coramand is us3d to 
perform parametei updates Parameters that are changed ft'equently can be incorporated kto 
commands to be specified on each task Parameters, such as Gtitp^ xight shift, additional term 
for rounding, satuiatioa low/high botnids, saturatioa low/hi^ set values, and operand 

15 &i2e(S/16 bit), that are not often changed can be updated using the write_j3arameter command.. 
I he con:irgmabIe IPP hardwaie co-pioces&or suppor ts the foilowing computational 
coinmands directly: 

- Row/coiuum 8-poiat DCiyiDCT 

' " ' Vector addition/subtractioa/muldplicatiQH 

20 - Scaiar-vector additkjuysubtiaction/inuUipEc^on 

- Table lookup 

Sam of absolute diffeiences 

In addition, through extension and special <jasing of the above generic computational 
commands, the IPP also suppoits: 

25 - ZDDCI/IDCI 

- demosaicin^ by simple inleipolation 
chroma subsampling 

- wavelets analysis and reconstruction 
■ coioE suppression 

30 - coloE conversion 

- memory-to-mcmory moves 

Each comnpand will include pointeis foi relevant data and coefficient stQragc{input data) 
as well as addresses fox ontpiit result data Additioaally, the number of filto taps, v^fdomi 
sampling factors, the nnmber of outputs poduced» and various pointer inaement options aie 
35 attached to the computational commands Because image piocessing is the application area^ 2 • 
D Woclc piocessmg is allowed whenever feasible.. 
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Figure 4 illustrates another possible arrangement of ciicuit 100 Circuit 100 illustrated 
in Figure 4 includes 2 leconfiguiable IPP hardware co-processors, 140 and 180. Digital signal 
processor rare operates wiUi first reconfgiirabie IPP tardwai e co-processor 140 and second 
rec05ifignisble IPP liaidware co-processoi ISO A private bus 185 couples first reconfigmatiie 
IPP h^ware co-proces50t 140 to reconiigurable IPP bard ware co-processor ISO. liiese co- 
processors have private memories sharing the memory space of digital signal processor core 
110 The data can be transfeiied via pi ivate bus 185 by one co processor writing to the 
address range encompassed by !he odier co-proccssoi *s raemoEy Alternatively, each co- 
processor may have an output port directed toward an input port of anothei co-piocessor with 
tlie links between co-piocessors encompassed in piivste bus 185.. lliis construction may be 
particularly usefiil for products in which data flows from one type operation bandied by OEie 
co-ptTocessor to another type of operation handled by the second co^ processor This privac 
bus frees digital signal processor HO from having to handle the data hardoff cither diiecdy or 
via direct memory access circnit 120 

Alternatively, Figure 5 illusa ates distal signal prooessoi core 110 and a leconfigmable 
IPP iiai'dwar-e co processor 140 loosely connected together via system bus 142 Digital signal 
processor core 1 10 may be of conventional design In the preferred embodimerzt, 
reconfigutable IPP hardware co-processoi 140 is adaptal to coordinate with direct memory 
access circuit 320 for aujonomous data transfeia irrfepcndcnt of digital signal processor core 
1 10, Ihe parts illustrated in Figuie 3 are not intended to exclude the provision of other 
conventional parts Ihe system level connection in Figure 5 may be usetul when the digital 
signal processor core L40 in a patticnlai implementation does not offer connection to its 
internal bus< fox example when using catalog devices Data trstnsfei overhead is nsuaUy lai^ei 
when IPP coprocessor 140 is attached to the system bus. yet there is mot e system level 
flexaiiity, Jikie using multiple DSPs or multiple IPPs in tbe same system, and relative ease of 
changing or upgradirig DSP and IPP. 

As an example of the communication between and the DSP and the IPP, if the DSP k 
instructing the IPP to peifbrm a vector addition task, these ai« the events that occur from die 
DSP's point of view The DSP sets up the DMA transfer to ^nd data to the IPP I hen the 
DSP sends a wait_nntil_ signal command to the PPfthis signal will be asseittd by the DMA 
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QODitroller once Ub& transfer is ^mplctsd). Mext the DSP sends a vsctoi_add CGmmanci to tiie 
IPP, wMch frees up the DSP to pexfonn other tasks. Now. eithei the DSP comes back to 
clieck on the completion status of the IPP, or altcraaUvcly, the DSP can be iutemipted upDn 
compledon of the IPP task upon receipt of a asseit signal cotmajand, wMch would follow tne 
vector^add coniniaiid Finaliy, the DSP sets up the DMA to get the result back from the IPP 
As meBtioned previously, as tliere is some overhead hi managing each data tiansier and each 
CQEnputation command, the ftmctionality of the IPP supports and encoiuages block 
computatioiB Another advisable practice is to peiform cascaded tasJs on the iPP for the same 
batches oi data, to reduce data tiansfers , and thus reduce the DSP load as weOl as tbe system 
bus load and overall powei coE3Sumption 

Ihe IPP supports one-dhnen^ionaL low-wise fStering when data is stored in lows 
Ceitam coHibinatbns of ^^^mpling and downgampling are su53poited as well For example, 
the following 5 methods implement vaiioiis up/down sampling options and cx>nstiaii]tts on filter 
length Only configurations A and D (Figures 8 and 12) aie considered here; there are many 
more methods in a fiilly recon^rable IPP datapath {Figure 13). 



Method 


a) no 

up/down 

sampling 


b) u/s up 
sample la 
space-tijne 


c)np 
sample in 
space 


d) - down 
sarr^jlein 
space 


e) up 

sample 

inspace-time 


Configqiatlon 
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MACO 
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MACs) 


trees) 
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Filter tap3(Util==l) 


Any 


any 


any 


Even 


even 


Upsampling factoi 


1 


8, 16, 24 


2, 4.8 


1 


4, 8, 12 


Downsampl &ctcu 


1 


Any 


1 


2 


Any 



Figm'es 6-15 iiiustiate die construction of an exemplary reconfigni^le IPP hardv?are 
co-proccssoT with Figures S and 10-15 illustratrng various Datapath conf^uiations Figure 6 
illustrates the overall blocic diagram general architectuie of leconfiguiabie IPP hardware 
coprocessor 140 accmding to a preferrol embodiment of the invention. On the host's memoiy 
map, the IPP Inteiface should appear as large contiguous memory blocks, ftii coefficients, data 
and xRacrO'Conmands, ajid ako as discrete control/status registers, for configuratioiE, command 
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queue, run- time coatioi, etc The coofiguration/cotrimaad queue registers may veiy well sit on 
tiie host's D3P external bus in either I/O oi menaory address space Multiple wrife addresses 
respeci to the host) must be set up to modify less frequently changed par^eteES m IPP 
such 25 lisurdwai'e hand5ha]^e sigoaJing, softwai-e leset, aM so on One write address foi 
commands, links to an inteiiial command queue. There aie a few additional write addresses 
for deariBg intenupts, one for each iutettupi There is at least one I'ead address for query of 
cotnniand completion status 

The data portion should into the host's memory space, if possible If the address 
space is insnfacieni, address and data ports should be separate, such that wilting to the address 
port sees up an initial address, and subsequent read/writes to the data poit transfer contiguous 
data from/to the IPP data memoiy . In terms of IPP implemeiUation, buffering is necessary 
between the outside 16/32 fait bxis and the internal memory's 128 bit width A smail cache 
be used for that purpose Read ahead technique for readmg and wiito-back foi wiitiug can 
reduce the access time Aioimd 512 bits in this buffer, half for read aM half for write* should 
be sufficient 

Ihiee logical mennoxy blocks, data memory A and B and command memoxy , are 
aMcssible fiom a S3rateia bus via an extestml bus interface.. The memoiy interface handles 
memoiy aibitratioa between the IPP 140 and the system bu5 142. as well as simple Fcst-Iii 
First-Out (FIFO) control involved in matching the system bus access width wfth the memory 
width. Data A and B are fei isput/output data and coefScients Cascaded commands can 
reuse areas in the data memory, so the terms input/output are in the context of a su\gfe 
command. As prcYiously mentioned, the Conmiand Queue 141 can receive commasKls irom 
Che digital signal processor UO via the digital 3^nai pcocessor bus 142, and m supplying those 
comnKiids to the Execution Contr ol uait 190. control the operation of the reconfigurable IPP 
hardware coprocessor 140 The cfmtrol blodc steps through the desired memory access and 
COTiputation fimcrions imficated by tise conunand. Command memoiy 141 is I'ead by Ihe 
decode unit 142 lo conserve memoiy, variable knglh coimnands are mcoiporated Ihc 
decode unit 142 sends the produced control parameters {one pet command) to the execution 
control unit 190. which use the control parameters to drive a pipelines control path to fan out 
the conaoJ signals to the appropriate componenss. Coottol signals can be eilher fixed or tianfr 
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vaiying in a connnsnd. They include memoiy access lequests, icput/ouiput formattei contiol^ 
and datapath ponUol 

Data metnoiy 145 and coefficient memory 147 are wide memoxy blocks (12.S-bit each) 
!o sxipport an 8-way parallel 16-bit datapath This 12S bit wide inemoxy block predudes the 
data path ftom having to access memoiy eveiy cycle. I he Data Memory 145 receives relevant 
kpat data via the DS? bus and atso stores the Resultaut Data subsequent processing through 
the Datapath core 170 and reformatting in the Output Porinatter ISO Coefficiejol data can also 
he leceived i^om the DSP bus 142, or possibly, provided in a loolc-Up Table within the IPP 
itself, and along with the input data, be processed throug!) the Datapath coie 170 and then 
refomatted in the Output foimattei block 18D Data memory 145 and coefficient memory 147 
may be written to in 128 bit words, This wiite opeiatian is contx oiled by digM signal 
pt ocessoi coi e 1 10 or diiect memoiy access circuit 120 which, tiuough the use of opeiand 
pointers in the cooimands, manage the two memoiy blocks Address generator 150 generates 
the addresses foi r-ecaU of Data and Coefficients used by the copnoccssor Ihis read operation 
operas on data words of 12S Mts from eadi memoiy 

I he tecailed 128 bit data words from Data and Gpefficient Memoiics axe supplied to 
input foimattei 160 lopiit foimatter 160 perfonns various shift and alignment operations 
generally to arrange the 128 bit input data words into the order needed for the desired 
computation, Input foimatter outputs a 128 bit (8 by 16 bits) Data A, a 128 bit (5 by 16 bits) 
Data B and a 123 bit (8 by 16 bife) Coeff Data 

llwse three data streains^ Data A, Data B, and Coeff Data, are supplied to Datapath 
170 Datapath 170 is the opeiatioEal pottion of the co-ptx3cessQc Fhc datapadi can be 
cc^figured in the run- lime to support a variety of rniagc processing tasks Figures 12 and 13 
illustrate two prisfetred enftiodiinei^ of the inveadoa. Some tasks can be mapped into both 
configurations ^ each providing a different pattern of input/ouqsur memory access These 
choices ofFcr flexibility in the hand of application programmers to balance speed, data memoiy 
and sometimes power requirements As will be furEher describe below, datapath 170 includes 
plural hardware multipliers and adders that are connectable m various ways to perform a 
variety ofmultiply-accumulate opaation? Datapath 170 oucputs thr ee adder data streams 
Two of these tikee ai^ 16 bit data words while one of the three is a 128 bit woTd(8 by 16 bits) 
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5 Ihese tbxe$ data, streams supply the inputs to output fannatter ISO Oatpiil foimattei 

180 leananges the three dats streams into eight 12S bit data wocds for wiiting back into the 
memory. Tiie addresses &i these two wiiie opeiations aie computed by addiess geneiator 
150 Ihis reaiiangement may take care of alignment oa memoiy word bottndaiies 

Hie opetations of co-proc^sor aie under ccntiol of control unit 150 Contr ol miit 190 
10 lecalls the comKjands from command queae 141 and provides the coiTespondhig contiol vsithin 
co-piocesso£ 140 

rhe consti action of input formatter 160 is illusaated in Figui e 7 I he two data sticams 
Data A and Data B of 128 bits each are supplied to an input of multiptexeis 205 and 207.. 
Each multiplexer independently selects one input for storage iq it's coiiespoading register, 

1 5 215 and 21 7 lespectiyeiy Multiplexer 205 may select eithei one of the input data stieams oi 
to recycle the contents of register 215 Multiplexer 201 may select eithei the contents of 
regis tei 215 ot to recycle tha contents of it*s tegistei 2U Multiplexer 207 rnay sdect eithci 
the othei of the input d^ta stieanis, or to recycle the contents of register 217 The lowei bits 
of shiftei 221 are supplied ftom registei 3 15 Ihe upper biis of sMSei 221 ai'e supplied by 

20 register 211 Shifter 221 shiftg and selects all 256 of k's input bits and 128 bits are supplied 
to one fiiJl/4 way 64b x 2-1 multipleKei 231 and 128 bits are suppli«i to full/lwayMway 128b 
X 3-1 multiptexei 235 Ihe 12S bit output of multiplexer 231 is stored teirqsoiarily m registei 
241 and foims the Data A input to d^path 1 70 The 128 bit output of mottiplexer 235 is 
stored temporarily in register 245 and forms the Data B input to datapath 1 70. Ihe output of 

25 multiplexer 207 is supplied diiectly to a fiiIl/lw^2wMw 128b x 4-1 multiplexer 23 7 as well as 
supplied to register 217, Multiplexer 237 selects the entiie 123 bits supplied from register 21 7 
and stores cHe reisult in registei 247, This result forms tte coefficient data input to datapath 
170 

As mentioned previously, the fiiree data stieams. Data A, Data B, and Coeff Data, are 
30 supplied to Datapafe 170 for processing Figure 8 illustiatcs a Datapath architecture ao^dk^ 

to a f hst preferred embodiment of the invention, in which eight Multiply Accumulate Units 
* (MACs) are connected in paralld("A" confutation) . The muWply-accunaulate operation, 
where the sura of plural prcducis is formed, is widdy used in signal processing, for example, 
in many Sltei algorithms , N multiply acciirnu!ate(where S m this example) units sue 
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5 opeiated in parallel to compute N output points.. This configuiation is suitable foi a wide- 
manoTy woid liiat contains multiple pixels, topical foi image processing The feedback loop 
on tfie finaJ row of adders contain multiple "baaks of accumulators to support upsampling 
According to a preferred emkjdmieiit, each MAC is associated with 3 accuniulators, and 
Control Uitit 190 includes tlie neccssaiy addi'essing mediamsm for these accunmlatois An 
10 accumulator de{«±L of three is chosen in order to support color cxinvei sion, wMcli involves 3 x 
3 matrixiag limSi an accumulatoi depth of thr«a simplifies tmplemejatation fbi coloi 
conversion 

Figuie 9 illustrates the construction of the output foijnatter ISO iliusttated in Figure 6 
Tlie 16 bit dataword ou^^uts of the fkst and second accumulators withiu reconfiguiabte IPP 

1.5 hmdwai'e co-ptocessor 140 (Acc[0] and Acc[l]) form die first two inputs to the output 
fomiatcer 180, wim the oucpuis of all S accumulators of reconllgutahie IFF hardware co- 
processor 140 {Acc[U\, Acc[l], Acc[2], Acc[3], Accl4], Acc[5], Acc[6J, Acc(7]) providmg the 
third input to die output foimattei Eight, 16 bit blocks aie wiiuen to memory 145 
subsequent piocessing thiou^ the multiplexers and legisters of output fomiastter 180.. 

20 Figuie 10 illugtrates tic constnictioE of datapath 170 according to a second iH-efcired 

embodiment illustiatiag a single S-ti-cc addci configurauon("B" configuration) Various 
segmenls of die Data A and Data E 12S bii(8 x 16 bit) dataword inputs to the datapath i 70, 
suppBed t orn input fonmttei 160, are supplied to addeis/subtiactois (adders), 310, 320, 330, 
340, 350, 360, 370 aiid 380 As ^wa, tbe iiist 16 bit daJawords, Daa A[0] and Data BID], 

25 wMch lepteseat the left most or most significant bits of the 128 bit outjnit, are coupled to adder 
310, and adder 320, the second 16 bit dataworda Data A[l] and Dala BII] aie coupled to adder 
330 and adder 340, the third 16 bit daiawords. Data Ap] and Data Bt2.] are coupled adder 
350 and addei 360^ fee fourth 16 bit datawoids, Data A[3] and Data B[3] are coupled to adder 
3 70 and adder 3SD,, Ihc result of this addition or subtracdon of die first 16 bit datawords 

30 Lhrough tbuith teawords is s:ored in pipeline registers 312, 322, 332, 342, 351 3^2, 372 and 
382. This result is then multiplied by the Coeff Daia, which for this cooftguiation of IPP> 
consists of the SEune two 16 bit datawords In other words, with the 8 MAC configviatioa 
shown in Figure 10, 4 data words and two coefficient words are fed to the hardware, on each 
cycle . Ibess same two coefficient words ar e ^sed m every pair of adders to multiply the input 
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5 data point wifli, and the products, vvliicli are stored in pipeline legisfers 316, 326, 336, 345, 
356, 366, 376 and 3S6, are summ^^d in adcleis 318, 338, 35S and 37S The results of those 
summalioiis are simuned in adders 328 and 368, &e summations of wlikh ate added in adder 
34S output of addei 348 is ai^cumulated in accumulator 349 liie benefit of tiiis 
coafigutaUon is the requirement of only, albeit S multipUers, one accmnuiatot to process the 

10 two 128 bit word ononis of input formatter 160. 

Figure 1 1 Ulustrates the consEiuction of d^mpath 170 according to a third preferred 
embodimenl illustrating a dual 4-tree v?tth bntteifly adder configuiatioiL("C coafiguiation") 
. Vaiious segments of the Data A and Data B 128 bit(8 x 16 bit) datawoM inputs to tie datapath 
170, supplied ftom input tbimatter 160, aie supplied to adders/subtractors (adders), 3 ID, 320, 

15 330, 340, 35D, 36O3 370 and 380. As shown, the first 16 bit datawords. Data A[0] and Data 
B[0], which represent ths left most ot most signlTicant bhs of the 128 bit output, arc coaipied to 
addei 310, tke second 16 bit dalawords Data A[l] and Data B[l] are coupled to adder 320, the 
third 16 bit datawords. Data A[2] and Data B[21 are coupled to adder 330, the fburth I6^k 
datawords, Data A[3] and Data B[3] arc coupled to addet 340, lie fifth 16 bit datawoids. Dam 

20 A[4] and Data EH] are coupled to addei 350, the sixlii 16 bit datawords Data A[5] and 

DataB[5] are coupled to adder 360, the seventh 16 bit datawords Data A[6I and Dala B[61 are 
coupled to adder 370 and the eighth 16 bit datawords, 01 the least significaiit bits of the 128 bit 
output of input fonnatter 160, Data AI?] and Data B[7] are coupled to addei 3S0, The result 
of this addition or jubtiajcfion of fust 16 bit datawords thiough eighth datawords is stor«3 in 

25 pipeline registeis 312, 322, 332, 342, 352, 362, 372 and 382. Ihis result is then multiplied by 
the Coeff Data, which foi this configuration of IPP. consists of two 16 bit words,, In otlicr 
words, with the 2 MAC cocfiguration shown in Figures 11,8 datawords and two coefficient 
words ate fed to the hardwaie, on each cycle. These sauie two coefficient words aie used in 
eveiy adder /multiplier portion of each MAC unit to multiply the input data point with, and the 

50 products, which are stored in pipeliEc rcgistcts 316, 326, 336, 346, 356, 366, 376 aud 3fi6. 
are summed m addeis 318, 338, 358 and 37S The results of those summatiGns ate sunaned in 
adders 328 and 368 Ihe sununatton from addei 328 is then sul^ractcd fiora the suramalion 
from adder 368 in subtractor 388 . Ihe output ftom 388 is then accumulated in accumulator 
359 The sammadon from addei 368 is then added to the simmiation fiom adder 328 in adder 
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348 The output of adder 348 is then accumulated m acamulator 349 The output of adder 
348 is accumulated in accumulatoi 349 Hie benefit of this configuration is the r^iuh'ement of 
ody, albeit 8 multiphers, two accmnulatai:^ to pxocass the two 128 fait woid outputs of input 
fonnsttet 160. 

Figui& 12 illusitates the comtruction of datapath 1 70 according to a fouith prefened 
embodiment wheiein a quad adder configuration is ilhistiatedC'D CDufiguiation") 
Vaiious segments of the Data A aaid Data B 128 bit(8 x 16 bit) dataword inputs to the dat^ath 
170, £Ti|)plied fiom input forinatter 160, are supplied to adders/subtiaaors (adders), 310, 320, 
330, 340, 350, 360 3 70 330 Iwo different input data schemes aie envisioned., The feat 
scheme pi ovides 8 dalawords and 2 coeffilcient words to the hardwai e each cycle 
Downaampling oi 2x is peifoimed with the filtering Each paii of MAC unife peifoims two 
tnuitiplications and accumulates the sum oi the prodi^^ The second scheme prcvides 2 
data words and S coefficient wojds to the haidwaie each cycle Again, each pair of MAC units 
perfoms two muliiplicatioiis, an addition and an accumulation TJpsampling is peifonaed with 
the 4-way parallelism and optionally with the depth of each accumulatoi.. 

According to the first scheme, the fkst 16 bit datawotds, Data A[0] and Data BEO], 
which icpi'esent fte left most or most significant bits of the 128 bit output, are coi5>led to adder 
310, th& second 16 bit datawords Data All] and Data B[l] are coupled to addei 320, the thud 
16 bit datawords. Data A[2J and Data B[2] are coupled to addei 330. the fourth 16 bit 
datawoids. Data A[3] and Data Bp] are coupled to adder 340, the fiftti 16 bft datawcads. Data 
AE4] and Data B[4J are coupled to adder 350, the sixth 16 bit datawords Data A[5] and 
DataB[5] ate coupled to adder 360, the seventh 16 bit datawotds Data A[6] and Data BE6] are 
coupled to addei 3 70 aad the eighth 16 bit datawofds Data AfT] and Data B(7] are coupled to 
addet 3SQ Ihe result of this addition or sid^trac^ion oi jfit st bit datawords through eighth 
datawoids is stored in pipeime legisccrs 312, 322, 332 , 342, 352, 362, 372 and 382 This 
result IS then makipKed by the Coefif Data, which for this configutatioa of TPP, consists of two 
16 bit coefficient words In othei words, with the quad 2-tree adder configuration slK)wn in 
Figme 12, 8 datawoids aad two coeHicient words are fed to flie hardware, on each cycle Ihe 
same two coefficient words aic used in every paii of MAC units lo nuUtipIy the input data 
point with, and the products, which are stored in plpelke registers 316, 326, 336, 346, 356. 
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5 366. 376 and 3S6. are summed in addet s 3 18, 338, 35S and 3 78 llie suiDmatioH firom addeis 
318, 33g, 35S snd 37S aie then accumulated in accumulators 319, 339, 359 and 379 The 
benefit of this configuration is xh^ requiremeni of only^ albeit S multipliers, four accumuiators 
to process tiie two 128 bit woid outputs of input formatter 160 

Figure 13 iilustiates the construction of datspafh 170 that includes routing and 

10 miiltiplexiDg necessary to sufxport the 4 configuration, A, B, C, aad D (Figures 8, 10, 11, and 
12). Various segments of±t Data A and Data B 12S bit(8 x 16 bit) datawoid inputs to ike 
datapath 170, supplied from input foimatter 160, are stippiied to adders/subuactois (adders), 
310, 320, 330, 340, 350, 360, 370 and 380 As shown, the first 16 bit dacawords. Data A[0] 
and Data B[0], which j-epresent the left most or most significant bits of the 128 bit output, are 

15 coupled to adder 310, the second 16 bit datawords Data A[l] and Data B[l} are conpkd to 
adder 320, the third 16 bit d^awords. Data A [2] and Data B[2I aie coupled to addet 330, the 
fourth 16 bit datawords, Data A[3] and Data B[3] aie coupled to adder 340, the fiiih 16 bit 
datawords, Data A[4] and Data B[4j are coupled to adder 350, the sixth 16 bit datawords Data 
A[5] and DataBI5] are coupled to addei 360, the sevettth 16 bit datawords Data A[6] and Data 

29 B[6] aie coupled to adder 370 and the eighth 16 bit datawords Data AtT] and Data B[7] are 
csjupled to addei 380 The result of this addition oi subtraction of first bit datawords through 
eighth datawords is stor&d in pipeline registers 312, 322, 332, 342, 352,, 362. 372 and 382 
This resuh is then multiplied by the Coefif Data, which for this configmation of IPP, consists 
of the same 16 bit dataword In Othei words, with the S MAC configwration shown in Figures 

25 8 and 13, 8 datawords and caie coefficient dataword is fed to the hardware, on eacli cycle 
This same coefficient cfetaword is used in every MAC unit to multiply the input date pomt 
with, and the product, which are stored in pipeline registcis 316, 326, 336, 346, 356. 366, 
376 and 3«6, are accumulated in adders 318, 328, 338, 348, 358, 368. 378 and 388 

Actually, as shown in the louttag and multiplexing for conftguiations A/B/C/D diagram 

30 of Figure 13, the products foim one input to adders, 318 thiough 388 The second inpi^ to 
addei 318 is formed l^y the output of multiplcm 319. which has two inputs; the first being the 
product fiom the multipliei 324 and the second being the accumulated sum of addei 318. 
Addei 328 multiplexere 325 and 329 on both inputs Multiplexer 325 selects between 
multiplier 324 or the output of addei 318 Muttiplexer .329 selects between acamulated result 
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ftom adda 328 itself, oi horn the next adder 338. In the 8 MACs coifigULatiOn {A, Figure 
8). the pair of adders 3 18 and 328 mplement sepaiate accumulatioa of products fiom 
maltipiiers 314 and 324.. In the quad 2-tiees configuratioii (E, Figure 12), the pair of adders 
318 and 32S implemeiK sraunation of th& prodijcts (by 318) then accumulating the sums (by 
328) 

Simiiaily, die addei pair 338 and 348, the addei pair 358 and 358, and the adder pair 
3 78 and 388 each implement either separate accumulation of pioducts or acciunulation of snms 
of 2 products In case of the summed np accinmilation supporting quad 2-ti'ees coafiguradon, 
addeis ml 368 produces thefmal accumulated outputs, just lake adder 328 

lo support the dual 4-tise with butteifly configuration (C), multiplexeis 319^ 339, 359, 
and 379 aie selected such tiiat adders 318, 338^ 358, and. 37S sums up neigbhoiing pairs of 
pi oducts fiom the S muh^iieis,, Multiplexers 325 and 329 are selected such that adder 328 
adds up results of adders 318 and 338, and tius has the sum from the first 4 multiplieis 314, 
324, 334, and 344 Multiplexers 365 and 369 ai'e similarly selecfiad so that adder 36S has die 
sum fiom tiie last 4 multipliers 354, 364, 374 and 384 These 2 sums, at addeis 328 ifcod 368, 
are then routed £o both adders 348 aiHi 390, which implement the cross add/subuact 
opci^ations Addei 348 peifonns die addition, and adder 390 perfoima the adytcaccion. Results 
fiora acMers 348 and 390 are nest routed to adders 388 and 392, respectively, fox 
accmniiiation Addeis 388 and 392 produces the final pair of outputs 

To sijppoit the single S-tree configmadon (B), all multiplexct configuiatkm fiat dual 4- 
tiee with butterfly conSgnratiim (C) is retail]^. Adder 348 has tte sum from all S multipliers, 
and adder 388 has the accumulated result Output of adder 392 is simply ignored. 

Figure 14 illustrates a sunpiified version of rminfigurable datapath architecture Ihis 
simplified architectui^ supports both tic paiaMcl MACs of Figme 8 and the quad 2-trees oi 
Figure 12. As is shown^ mstead oi Ehe separate addeis and muJiipliers illustrated in figures 3 
and 13, both Data A and Data B inputs ar e applied to boliL a multiplier and an addei/subtractoi 
(addet) and then the outputs of either the adders or multipliers are select^ before ^mg out of 
the muldply/add/sufatiact blocks 81,0, 820, 830, 840, 850, 860, 870, SSO A more In depth 
illustratioa of a paii of die MAC uniis of Figure 14 is shown in Figure 28 Each MAC unit is 
capable of peiforaung a pipelined sin^e cycle multiply accumulate operation on two inputs 
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D_mp and C_inp AcmRulation of D^inp i CJnp or D_mp - C_inp instead of D^iup * 
C_ inp is also possible, ience tke add/subtiacE unit 3 10 placed ia parallel with each multiplier 
314 Tlie multiplexei 610 chooses between the adda/subtiactoi 310 output oi the nauUipSer 
3 14 output . Between each pair of MAC units, there is afea the quad 2-trees optionCi^dicatcd 
by the AND gate 710) to add up the pair of results (D_ij3p */4V- Cjnp), to produce ACC_mp, 
which feeds the accTuntiiaimg adder 818 

As shown in Figoie 14, htsQi of the above described conGguiatioiis ar& hnpleiaeEted. 
Althou^ only S adders (excMdjtng those in parallel with muliplii^) aie active at any g^ven 
time, 12 physical adders aie used m this design, in ordez to induce the cost of muMplexiDg and 
rcHiting,, fhe AND gates 710* 720, 730 and 740 on the cross path control whether or JK>t the 
*/-hA results should be added togcthci. As shown in Fi^e 28, Qiree accuinulatois 612, 614 
aad 615 arc available in e^h MAC uoit Co implement upsampling., Hie accumulatoi SIS can 
select, via nmltJplexei- 61S, any of flic three as input (with the otter input bcii^ ACC_ inp)i or 
ixoin iiie half-unii quandiiy Ibi rounding, RND^ADD. On the very first cycle of valid data oa 
ACC^^inp, RND^ ADD should be the sdccEcd irpot 

Rcunding and saturatton follow the inain ariihme^ datapath. With the half-uoit 
quantity already added to the accumulated sum, rounding is simply a right sbift Figute 15 
iiiustrates a more simplified veiston of Figure S tiian that illustrated in Figure 14 I Jie 
coniiguration iJInstiated in Figuie 15 compiises only 4 MAC units versus 8 MAC units 
illustiated in previous coniigurations and does not contain the pre-add illustrated in Figures S- 
14 As illustiated in Figuies 14 aM 28, Fifuie 15 illustrates Data A and Data B inputs applied 
to botii a multiplier 314 and an adder/subaactoi (adder) 310 and then the outputs of the adders 
and multipliers are multiplexed togethci in multiplCKers 510 and 620(Figure 28) Because 
there is no pr e add, post multiplexing, the outputs of the multiplexers 610 and 620 are 
accumulated in accumulators, 318, S2fi, S3fi and 848. As previously desciibed with reference 
to Figure 14, and as shown m Figure 28 . tliree accumulators 612, 6H and 616 arc available in 
each MAC unit to implement upsampling Ihe accumulatoa SIS can select, via multiplexer 
618, any of the thise as input {with the other input heing ACC_mp), oi irom the half-unit 
quantjtiy for rounding, F^FD_ADD . On the very first cycle of valid data on ACC_mp, 
RND^ ADD should he the seleded input 
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5 In Figuies 14 and 15, it is sometiines desiiable tc add absolute diffeience opeiationto 

th& nmltiply/aiM/siibtract blodt Ihis mil speed up motion estimation task in video encoding 
applications 

Figut'e 16 illusti ates the input data formauing necessary to peilbrm the IPP aperattoa oi 
low jHtedng On the fiist cycle, the Data A input to all 8 MACs comprises Uie fiist 8 data 

10 vvoids Every cycle, the window of input data words used to feed the MACs is siilfted one 
word to tfie light Data E input of all 8 MACs is fed the same coefficient word In this 
example, a 3- tap FIR filtei is implemeiited, so tiuee coefficient words ai€ ptovided 

In the figure, Xo X? compiise the fet Data A input to the MACs duiing a fiist clock 
cycle. Shifting by one data ?/oid, the second Data A input becomes Xi Xs duiing a second 

15 cioek cycle The Data A inputs continue in this naannei , supplying each MAC with 

consecutive sequence of data woids The fast iiltei coefficient Co is broadcast to all MACs foi 
the first cycle Ci is broadcast tc all MACs for the second cycle, and Ci foi die third cycle. 
At the thiid cycle, the MAC units have accumulated the couect outputs and can write back 
results to data memory The data feed continues at Xe , Xis to begin to complite output 

20 Yft . Yis, and the coefficient feed wraps back to CO 

Maintaming the same configuiation:. an atonative output is rendered when instead of 
supplying 8 data words and one coefjadent wi;d to the hardware, providing one data word and 
S coeifjcientB words foi the 8 iiUer banks. Again each Mac is wockiag kdependcntly. 
multiplying the same data word with its specific coefficient word and iccumtiEaThig the 

25 products. UpsamplingisperfonnedwiththeS-waypaiaUelismandqpiionaltywi^ 

of each accumalatoi Figure 1 7 illustrates the input data fomaattins necessaiy to peifoim a 
symmetric row filtering operation. In this example IPP implements a 3-tap filter , so the first 
and third coefficients are equivalent.. Therefore, only two coefficient woids are provided. On 
the first cycle, the Data A mpm com]^i5es the f&si 8 data words Xo . Xr The fii^t B 

30 input conipi isesdata words Xz X? In addition, the first coefficient supplied to all the 
multipliers is Co Ihe second Data A mpMi is the first Data A inpiU: shifted to the right one 
word, 01 Xi Xs The second Data B mput is fhe sanae 8 data woids Coefficient Ci is 
supplied to all the multipliers on the second cycle Effcctively, IFF computes 
Co*(Xo + Xi) +■ 2*C!*X: on Uie first MAC, 
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f Xs) + Z'^'Cj^Xi an the second MAC, 
and so on. Let the desired filter coefficients be Fo , Fi, F2, where Fo =^ Fz liie supplied 
coefScients shmld relate to tie desired cdefficieats by 
Co = Pb 

At tlie end of tlie second cycle, the 3-tap filter outputs are ready io be stored back to data 
memoiy . On the ttdid cycle, the A input is supplied with data woids Xs . Xi5 ^ Data B 
input is supplied with Xio .Xu , and coefficient is wiapped back to Co 

Figure IS ilMsttatcs wheie from in m^ory title data comes to perfotm a coJumn filtei 
operation. Ihc computatiDnal model and couunaiad syntax is ^iniiJar to the row filta 
coGiputanonal model and conunand ^ynta;c» except that data 15 stoicd m lOW-major oidier, and 
isjjiei products are peribimed along coJmnns. For besr efncienc); data, coefBcieat and output 
airays should aU be aligned to a 8 x 16 bit memory word As is shown m F^'e 18, In this 
case the aljteady aligned data is takm dii cclly from memory wxsrd to the dat^atk. la othei 
words, no input foimatting of the data is necessary Each coeffident is applied to all S MAC 
aoits in the parallel MACs configuration shown in Figmes 8 and 10 through 13 An N tap 
colunm flltei talces N+ 1 cycles to produce 8 outputs Iherc aze N memory reads and 1 data 
memoiy wiiles \n each N f 1 cycles When N > 8, there is one coefficient memory icad 
every S cycles Otherwise theie is aa initial read then all subsequem coefficients are supplied 
by the rsgistEi in input fonaaUer; no further read is needed,, Coefficient r ead frequency is the 
same as in row fUteiing, 1 read/8 cycles if N> 8, and is zeio otheiwise. 

Figure 19 Diustrates the IPP conJEiguration necessary to perfoim the sum of absolute 
diifeiences used to enhance the perfoimance of video encoding As shown in Fipre 19, Data 
A comprises Xo „ Xt and Data B connprises Yo Y? Coefficient words arc not required. The 
difference between each Data A input and each Data B input is calculated in subtractois 310, 
320, 330, 340, 350, 360, 370 and 380 and those differences aie stored in registers 312, 322, 
332, 342, 352, 362, 372 and 382 Ihat difference is then multiplied by eithet a plus 01 a 
minus sign depending upon whether Ihe difference is positive or negative in multipliers 324, 
324, 334, 344, 354, 364, 3 74 and 384, in ordei to yield a positive number Ihose pioducts 
are stored in registers 316, 326, 336, 346, 356, 366, 376 and 386 then summed in adders 31 
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5 328, 35S and 378 and those sums surnmed in addeis 328, 348 and 368 The stun of adder 348 
is then accumulated in accmBulato 349. For the sum of absolute dilfeTences we operate on 8- 
bit pixels, so the adders only liave to be 12-biEs wide, except fox tlie final accumuiator, wMcli 
must be 16 bits wide Satuiatioa tbiesholds and roimdmg parameters can come ftom yet 
another bank of registeis 

1 0 Figmes 20, 21 and 22 illustrate the IP? operation of Discrete Sine/Cosiue Demosaicing 

including the steps of Row Pass and Colomn Pass Most digitai still cameras employ color 
filter airay in the imagei £hat pioduces intedeaved color infonnatioii Demosaicing is the 
process to obtain the missing coloi component ftom available neigiiboring same-coloi 
components Simple lineai: inteipolation approack is often used, whicb can be represeated by 

15 the diagiam illustrated in Figure 20 Ibe weights are either 0 5 or 0 25, dejieodrng apon 
whetbieT tiieie are 2 oi 4 closest same-coioi naighbors (excluding boundary conditions) 

The thice colors are processed sepaiateiy, with red processing essentially the same as 
blue. Hack color is processed in two passes* a row pass and a horizontal pass The row pass 
is gi aphicaliy repiesented in Figure 2 i From each green/red line, one fiiD green lioe and one 

20 full red line is geneiated,, For the peen component, row pass filtoing is implemented by a 2- 
pbasc, 3' tap filter, v/ith coefficients {0 5,0,0 5) and (0, 1 , 0) fbi the two phases For the red 
coinponent, row pass filtering is implemented by the same 2-phasc, 3-tap filtei , with 
coefficients (0, 1, 0} and (0 5,0,0 5) Each blne/greeu line is processed similaily to generate 
a full blue line and a full gieen line 

25 Piodncing two color onlput rows ftom one row should be merged into one command, 

using np- sampling-like looping It takes 6 cycles to piocess 8 input pixels, For each group of 
6 cycles, diere is one data memory read, two data memory writes . and three coefEcicnt 
memoiy reads 

The impl^mentatioti of column pass for damosaic l ed/blue components is illustrated in 
30 Figure 22a Fot read and blue colors, two tap column filtering is used It takes thr ee cycles to 
piTOess 8 input pixels during which there ane two data memory r eads, 1 data memoiy writes, 
and there are no aeady-state coefficient memoiy reads. 

Ihe implemfintatioa of column pass fbr denaosaic gieen components is illustiated in 
Figuie 22b For the green color component, 2 phase 3 -tap column filtering is used, with 
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coefficients (0,25, 0,.5, 0.25) and (0, 1, 0) Eigk input pixds aie processed m 4 cycles 
liiere ais three data memory reads^ oae data memoiy wiite, and zero coefficient memoiy 
leads per gioup of ^ cycles 

la sum, 11 cycles ai'e spent foi the mteipolation sciieme of demosaic for 8 input pixels 
Out of 13 cycles. 6 memory reads, 4 data memoiy writes and 3 coefficient memoiy reads 
are perfoimed. 

Figure 23 illustiates tie formatting of the input data to perform the IPP opeiation of 
wavelets, row pass In image teclmology, wavelets are used ioi image 
compi essioji/decompiessaon and featnie eKtiaction, for example^ as a |K e-processing stage for 
textaral features . Th^ wavelets opeiatioH can tc implemented on any of the paraflel 8 MAC 
configiuations illustrated ta Figures 8 and 10^ 13 oi the more simplified versions of Figures 14 
and 15 The row pass of waveleta analysis is implemented as 2x upsampling, 2x 
down^mpiing {to actiieve fasgMow ft^uency batiks), row filteiing 

Figure 24 illustrates where from, in memory, the iiiput data comes, in ca?dei to perfoim 
tSxQ colunm pass portion of the wavelet opeialion The columa pass is treated as 2x 
upsampling, 2x downsan^ltng, column filteiing . Again, data, coeflBcient and oucpnt arrays 
should all be aligned to a 8 x 16 bit txiemory wotd As is shown in Figure 18, data is taken 
directly from memory word to the dat^ath Tn other words, no input formatting of the data is 
necessary Eati coeffident is applied to all 8 MAC units in the parallel MACs configuration 
shown in Figures 8 and 10 through 13 oi to the four MAC units illustrated in Sguies 14 and 
15 It takfes N + 1 cycles to produce 8 outputs, where N is the numhei of filter taps hi the 
wavelets kernel There are N memoiy reads and 1 data memory writes in each N+ 1 cycles 
Coeffident read frequency is the same as in row fifeeiing, 1 read/8 cycles if N>8. and is zero 
otherwise For wavelet recotistmction, sepaiatdy process high and low frequency baniss with 
2x upsampling filters Finally, combme the two banks iising vectoi" addition.. 

Figure 25 illustrates the IPP opeiation of Indirect Cosine Transform (IDCI) m a low 
pass fornEit. As shown, row-pass IDCT is implemented with Ihe fiiU matiix-vector appicach. 
llihiy-two multiplications are used foi each S-point transfoim Althazgh not seemingly very 
efficient, a straightforward application of flie IPP.. Any one of the 8 MAC configurations 
shown in Figures S or 10-15 can be used to perform this operation, btit the configuration of tbe 
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5 Split addei tees v^ith butreifly stovn in Figttt^ 11 is preferred Tliis coaQgucation can ia^c 
advaoiage of syiometry ia the fransfoim to reduce the nimiber of mTiltipIications by hali. in 
this. C35e tiie IFF uses the post-mulUplyyadders to impiemenr fee cross adciitioEs/subtractians 
One input dataword is pulled from tbe wide memory word pet cycle, and S coefScicat word5 
aie used pei cycie Eacli S-pomt transfarm takes 4 cycles to process. Duiing thes& 4 cycles, 
1 0 one data msmoiy ?ead, one data mensoiy v^rite and 4 coefficient memoi^i reads are peifonned 
If the buKBiSy stage ofxeconfiguiation is omitted (for example in Figures 14 and 15), the Ml 
Srby-S imttix multiplication mefliod has to be used, resulimg in 64 multiplications per 8 point 
traasfonn, and taking 8 or 16 cycles to perfoim each aansfbim (with 8 or 4 MACs in IPP) 
Figure 26 iUustratcs the IPP operalioa of Direct Cosiac Traosjform (DCT) -in a row pjiss 
15 foimat Similai to die lOw-pass JDCT xw-pass DCT can te impiemeated with 32 
multqjlications oi witii 64 nuiltiplic^ions, depeoding on the configurabiiity of IPP Wlien the 
dual 4-tree with pie-miiitiply adders conliguratLon (Figme 11) is available, it should be used 
The bulterfiy stage is disabled in this case All g data words ftom each Tttemoiy word a^e 
applied to tbe MACs, one to eacli Coemcients m applied the same way, oioe diflfeient 
coefiicieni iq ea£sh MAC It tJ&es 4 cycles to process oat 8-pohiT transform la this 
configuratioa Without the pre-multiply adders {foi example in Flgmes 14 and 15), each 8- 
point tcartsfofm wiJl require 64 muldpUcations, and take 8 oi- 1^ cycles depending on fhe 
nmnber of MACs in the IPP, 

Figure 27 iliustiates the IFP operation of IDC I in column foimat Single Instruction 
Muiliple Data(SIMD) Ihe parallel confiouration of 8 MACs shown in Figures S with some 
modilicittions in the accuiaulatois is needed to take advantage of symmetiy in the transfoira 
Each MAC unit requires 8 accumiElatois, Msd each accumalatii^ adder needs to take both 
[npats from the 5 accmnulatois With s«ch haidwaie capability, during the fii?t 4 <:ycles, one 
4x4 matiix will yield the fiist 4 pomts,, Dining the next 4 cycles, another 4x4 nntrix will 
pioducc the next 4 points. During cycles 9 and 10, die accunmlating addeis cross add/subtract 
and combine the outputs . Therefore, in 10 cycles, a pair of outpnt lesults, 16 points ate 
produced. During those 10 cycles, 8 d^ reads, 2 data wi&es and S cxief&cient reads aie 
performed Withoi^ die hardware modificatioD, it takes 64 multiplications per S-point 
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5 traiisfoim, so 16 potos of output will take 16 cycles on 8- MAC YcrsiDa of IPP» aad 32 cycles 
on 4-MAC veisioa of IPP In dther esse the sepaiate MAC configaiation is used 

In addition to the datapath coniigmabilily and input loiinAriing options, an efficient 
cOEtiol ^nd address generation scheme is demised for IPP. liis scheme leduces the 
implementatm cost of hardware contxolj and provides easy to use programming modeJ for 
10 IPP 

A]l computation shall occui inside a nested fox loop Tiniing for accunmlatoi 
initialization and wiite out shall be cOBtroiied by cooditiordng on the loop variables 
Imtialization shall happen when ceitain loop variables match with their beginning values 
Write out shall happen when the same set of vaiiables match with theh ending values. 
1 5 Cii dilating accumulators can be specified wrfh the innermost loop count indexing the 

accumulators All address iacxements for input data, coefScienis, and results, can be specified 
in teims of "when" and ""how much", and the "when" is associated \^ the loop variable 
The iailowmg i5 psuedoniode of a skeleton of control stincnire for iPP that illustrates these 
concepts. 

30 

dptr = dpti_init; /* initial value of pointers */ 
cptE cptr_init; 
optr = optz init; 



25 far <il^O; il«.pienci; il+f) { 

tox [i2=0; i2<==ap2end; ia++) [ 
for {i3-0; i3<-lp3«nd; i3++l { 
for (i4==0; i^<=*ip4end; i4 + *-> { 

30 /* neniory read and Input formatting 

/* ox dptr[0]j dptr[0,i}, dptr [0, 1, 2, 3] distributed */ 
y[0 7] - cptifO.. .7]; 

35 

/* accumulator Initialization */ 
if [irtitiaii2e_acc) 

a,cc [ i 4 *accinode ] [ Q .1] = i:n<J_ add [0 7 ] ; 

40 /* cpeiation-accmttulate ^/ 

acc[i4*acciitcdej [0. .7] f=x[0. 7] op ytO .7]; 

/* write back */ 
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5 if (writejDack) 

optr [0 -71 == 3atiizate_rouina(acc[i4*acciaode] f 0 7] J ) ; 
/* oi just Ir 2r or 4 outputs */ 

/* pointer updates 
10 dptr T- 

cptr i= . ; 
optr += .. . ; 

} 

15 ) 

The iaitializejacc comdition is tested by matching a specified subset of loop munt 
20 variables witii the begiiumg values (0) Ilie parameter accjoopjjevcl indicates whether none, 
i4» i4 and i3 , ca: i4, i3 and j2 should be tested . This same subset of loop count vaiiables are 
tested agaiEst &dr ending values to supply the wiite_back condition 

The pointei updates also mvolvc compating loop count vaiKibles. Fox example, foi 4 
level of loqps we can supply up to 4 sets of address modifiers for the data pointet , dptr. Each 
25 set consists of a subset of loop count vai iables that must match with theii- CTdir^ value, aod the 
amount in whic3i dptr should be biciecKmed whsQ the condition is uue. The same capability 
is given to coefficient pointei cpir and o;^put pointer optr 

In the above pseudo code, the paiamcteis are used which aie eifher statiially set with 
Write jparametere commaiKi or aie encoded in an IPP campjtatioual cammand. These 
30 paiameteies includes the ending values of loop count vatiables (beginmng value is always 0)^ 
accmode (single/circulating accumulatois)* op (nmltip^/add/sublrad/absdifi), acc_loop_level 
arid the addiess modifiers meatioaed above. 

Ail the supported imagiitg/vidca functions can be written in the above foiia and then 
translated into PP commands by jmpeily setting the paiamerers. The task of software 
35 development for IPP can follow this m^odalogy 
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I claim: 

I An image processmg peripheiial ODiiipiismg: 

a plurality of pairs of multiply accmnulate circuits comiected in paiallel, each 
fm of multiply accumulate circuits comprising; 

Er&i adder paiis, each one of each addei pair Haviog JBist and second 
inpm lecdviag respective first and second iiqjuts havjjog a first ptedetexmined 
iMbcr of bits and an output ptoduciag a ma. oi a diifereace of said inputs^ 

first rmdtiidier pairs , corresp^iKimg to said iii^t adder pairs, each 
multipJiet of eacli muteipliei pair having a first input of said suiQ or dxfSsitnct 
of sdd &st addei s and a second inpat of a coiistant pi edeteimined numb^ ajid 
producing a piwiuct outpat; 

secoic^ addci pairs, coirespcnding to said first multiplier paixs, cacli one 
adder of said adder pair having &st and second inputs iieoeiving respective first 
multiplier ompin^ from one oi tiie otifciei of said nuiitipUers of said 
conesponding muit^liei pair as said first input said 

wherein said one of said pair of second adds? receives an output £com a 
multipJexer said multiplexer having one input from a product of die other 
ouiltiplief of said first mnlt^Her pairs aaid a second input from an accaicaulated 
sum of said one adder of said second adder pairs as a second mput of said one 
adder of said second adda: pair and; 

wherein said othet of said pair- of second adders receives oatputis fiom a 
txtsi and a second multiplexer, said fii st multiplexer having one input from said 
othei multijplier of said first multiplier paii' and a second Input from the sum of 
said one adder of said second adder pair , said second multipleTcer having one 
input from the accumukted sum of said other adder of said second adder pau 
and a second input fiom the sum of a one adder of a s^ond pair of second adder 
pairs, as a second ouput, and; 

wherein said second adder pairs produce a sum output 
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An image processing peripbeial 

a pluiality of paiis of multiply accutdulate circuits connected in parallel, each 
pair of multiply accumulate ciicuits comprising; 

first addei paits, eacli one adder of each addei pair having fiisc and 

second inputs receiviag respective first and second inputs having a fii'st 

predetermmed mmbci of bits and m output producing a sum or a diffeieace of 

said inputs; 

first multiplier pairs, conespondiag to said fust addei pairs, eacii 
multipliei of each mulnpliei pair having a first iopiit of said sum or difference 
of said fust adders and a second input of a constant predeteniuned numbei and 
pioducmg a product output; 

second addei pairs, coitesponding to said first multipiier paiis, each 
adder paii implementing separate accumulation of said products of said first 
multipUei pairs, yielding an accumulated sum 

An itnage piocessmg peripheial comprising: 

a plurality of pairs of multiply accumulate ciicmts connected inpaiallei, 
each pair of multipiy accumulate circuits comprising; 

fust adder pairs, each one adder of each adder pair having first and 
second inputs receiviQg respective fiist and second inputs having a first 
predetenrriEied number of bits and an output producing a sum or a difference of 
said inputs; 

fii'st multiplier paiis. corresponding to said fiKt adder pairs, each 
multipiier of each muldj^ier pair having a fiist input of said sum or diffarace 
of said first adders and a second input of a constant predetciiniarf numb^ and 
producing a product output; 

second adder pairs, corresponding to said first multiplier pairs, each 
adder pair hnplemencing summation of said products of said pairs of nniltiplicrs 
and then accumulating the sums of said summations 
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4- All image piocessing peripheral compiising: 

a plurality oi pairs of multiply accuimilate circuits connected iii parallel, 
each pail of multiply accumulate circuits comprismg; 

fiist adder pairs, each one adder of each adder pair tmiog Eirst and 
second inputs receiving respective &st and second iBpuis having a fusi 
predetennined number of bits and an output producing a sum oi a dlffeteoce of 
said inputs; 

first muttiplier paii s. Cflnesponding to isaid first adder pairs* each 
itmltaplici of each miltip Jier pais having a ilrat input of said sum oi difference 
of said first adders ajid a sscand input of a constant predetermined number and 
producing a produ&t outputi 

secood aiddei paiis, corresponding m said fiiisimultipiier pairs, eacii 
adder paii implemeifiing acctraiulatioD of sums of two products 
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5 ABSTRACT 

The pfoposed architecture is integrated onto a Digital Signal Pfocessor (DSP) as 
a coprocessor (140) to assist in the computation of sum of absolute differences, 
symmetrical row/column Finite Impulse Response (FIR) filtering with 0 downsampling 
(or upsampling) option, mw/cotunin Discraie Cosine Transform (DCT)/lnverse Discrete 
10 Cosine Transfonrj (IDCT), and generic algebraic functions. The architecture is called 
IFF, which s^nds for image processing peripheral, and consists of 8 muffiply- 
accumuiate hardware units connected in parallel and routed and multiplexed together 
The architecture can be dependwt upon a Direct Memory Access (DMA) controller 
(120) to retrieve and write back data from/to DSP memory without intervention from the 
15 DSP core (1 10) The DSP can set up the DfttA transfer and IPP/DMA synchronfeation 
in advance, tfien go on its own processing task. Alternatively, fte DSP can p^orm the 
data transfers and synchronization itsetf by synchronizing with the IPP architecture on 
these transfers This architecture implements 2-D filtering, symmetrfcal filtering, short 
» ' fifters, sum of absolLife differences, and mosaic decoding more efficfentfy than the 
.20 previously disclosed architectures of the prior art, 
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